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ABSTRACT 



o 

H 
H 



Speaker-independent word recognition is performed, 
based on a small acoustically distinct vocabulary, with 
minimal hardware requirements. After a simple precon- 
ditioning filter, the zero crossing intervals of the input 
speech are measured and sorted by duration, to provide 
a rough measure of the frequency distribution within 
each input frame. The distribution of zero crossing 
intervals is transformed into a binary feature vector, 
which is compared with each reference template using 
a modified Hamming distance measure. A dynamic time 
warping algorithm is used to permit recognition of vari- 
ous speaker rate, and to economize on the reference 
template storage requirements. A mask vector for each 
reference template is used to ignore insignificant (or 
speaker-dependent) features of the words detected. 

18 Claims, 5 Drawing Sheets 
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(behavior distribution) 
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1 2 

It is a further object of the present invention to pro- 
SPEAKER-INDEPENDENT WORD RECOGNIZER vide a speaker-independent word recognizer for a lim- 
ited vocabulary which can be implemented using an 
BACKGROUND AND SUMMARY OF THE 8-bit microcomputer and analog chips. 

INVENTION 5 A further problem in speaker-independent recogni- 

The present invention relates to a speaker-independ- tk,n J" been the Preparation of an appropriate set of 
ent speech recognizer, that is to a machine capable of £em P ,ates - Any one speaker, or any set of speakers with 
automatically recognizing and decoding speech from an a «»■»"» .^cent, may pronounce a certain 

unknown human speaker word conM stently with certain features which will not 

There are many applications where it would be 10 "phcated in the general population. That is, the 
highly desirable to have such a speaker-independent templates for speaker-independent vocabu- 

speech recognizer configured for a small vocabulary. la ? T^* not ° f a word ^ ch K not 

For example, such a word recognizer would be ex- a stnctly feature. It ■ always possible to pre- 
tremely useful for automotive controls and video ,« I>«e a set of reference templates usmgempmcaloptimi- 
games. If even a very small control vocabulary were 15 2 * ua > *«* consuming, and 

available, many non-critical automotive control ranc- 3180 P^fthepossibdity of user-generation of refer- 
tions which frequently require the driver to remove his ence templates in the field. 

eyesfromtheroadcouldtedonebydirectvoiceinputs. Thus, it us a further object of the present mvenUon to 
Control of a car's radio or sound system could be use- , n ^ a ^h recogmzer, for when the preparation 
rally accomplished in this matter. The more sophisti- 20 template reqmres minimal empirical mput 

cated monitoring and computational functions available Ti.Tw/fS 1 !;* ♦ 

, . . . «r - xi . It is a further object of the present invention to oro- 

m some cars could also be more efficiently met with a ^ ...... J v u i . t . / 

/ . . ^ . _ .. vide a method for preparing vocabulary templates for a 

voice query/voice output system. For example, if a , • . . *\ . • , ■ , , 

driver couid say "mer'and have his dashboard reply „ ZSSSS^S^ SSSS^^ 
verbally "seven gallons-refuel within 160 miles," Ms 25 ^S^^^ lS^t^^r i „ „h 
wouldbeveryconvenientinautomotive control design. JtJZ^Z^SZZ v T^t- ^^epend- 
c . ., , j -j , , . , ® ent word recognizer for cost-sensitive systems is mem- 

Smiuarly an arcade video game could be designed to requirements. That is, it is highly desirable in many 

accept a lnmted « o "verbal inputs . such as "shoot", system! where small microcomputeiare to be used not 
pull up , dive" "left", and "right". These applica- 30 t0 tie too much £ ^ ^ WQrd 

tions, like many others, are extremely cost sensitive. recognition algorilto and templates. In particular, in 
Thus to provide a word recognizer for the large applications for portable devices (e.g a calculator 

body of applications of tliis type, it is not necessary that or watch which can receive ken ) x * amet)t the 
herecognizerbeabIetorecograzeaverylargev()cabu- power requirem ents of unswitched memory impose a 
lary. A small vocabulary, e.g 6 to 20 words, can be 35 critical constraint . Since vocabuIary tem pi ates 

extremely useful for many applications. Secondly, it is must ^ saved dnring power . off peri ods, the amount of 
not necessary that a word recognizer for such apphca- memory (CMOS or nonvolatile) required for speech 
tions be able to recognize a word embedded in con- reference templates is a very important cost parameter, 
nected speech. Recognition of isolated words is quite ^ it is a further object of the present invention to 
sufficient for many simple command applications. 40 provide a speaker-independent word recognizer which 
Third, in many such applications, substitution errors are has absolutely minimal memory requirements for stor- 
much more undesirable than rejection errors. For exam- m g reference templates. 

pie, if a consumer is making purchases from a voice- A further problem in* any word recognizer, which is 
selected vending machine, it is much more desirable to most particularly important in a speaker-independent 
™ ve * e ***** «P*y " m P ut n <* understood" than to 45 word recognizer, is that speakers will typically vary, 
W have the machine issue the wrong item. not only in their ave rage rate of speech, but in their 

,U Thus, it is an object of the present invention to pro- timing of the syllable within a given word. Since this 

vide a low cost word recognizer system which has a information is not normally used by human listeners in 
very low rate of substitution errors. making a word recognition decision, it will typically 

It is highly desirable to have such word recognizer 50 vary substantially among the speech patterns of differ- 
systems operate with a low computational load. In ent speakers. It is therefore necessary that a speaker- 
many attractive applications, a modest error rate can independent word recognizer be insensitive to a reason- 
easily be tolerated (e.g. 85% accurate recognition), but able range of variation in the average rate and localized 
the cost requirements are stringent. Thus, it would be timing of human speech. 

highly desirable to have a word recognizer which could 55 It is therefore a further object of the present invention 
be implemented with an ordinary cheap 8 bit microcom- to provide a speaker-independent word recognizer 
puter, together with cheap analog chips, but without which is reasonably insensitive both to average rate and 
requiring any high speed chips or dedicated processors. to localized variations in timing of human speech. 
Of course, it is always possible to do speaker-independ- It is a further object of the present invention to pro- 
ent word recognition using a minicomputer or a main 60 vide a speech recognition system which is reasonably 
frame, but such an implementation has no practical insensitive both to average rate and to localized varia- 
relevance to most of the desirable applications, since tions in timing of human speech, which can be impie- 
most of the applications are cost-sensitive. mented using a simple microcomputer with no expen- 

Thus, it is an object of the present invention to pro- sive custom parts required, 
vide a speaker-independent word recognizer which can 65 A further characteristic which it would be desirable 
be implemented with an ordinary 8-bit microcomputer, to implement in a speaker-independent word recogni- 
and does not require any high-speed or special-function tion system is the capability for vocabulary change, 
processing chips. Thus, for example, in a calculator which can be ad- 
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dressed by spoken commands, it would be desirable to 
have the set of spoken commands be variable with dif- 
ferent modules (for example), or to be user variable as 
user-customized software is loaded into the calculator. 

However, to accomplish this, it is desirable that the 5 
reference template set preparation be based on reason- 
ably simple exclusion algorithms, so that a reasonably 
unskilled user can prepare a new template set. It is also 
necessary that the template set be addressable, so that 
templates can be downloaded and substituted. 10 

It should also be noted that the capability to change 
templates is sensitive to the memory space required for 
each template. That is, if the memory templates can be 
stored reasonably compactly, then a mask location can 
be used to indicate which subset of all possible stored 15 
templates corresponds to the currently active vocabu- 
lary. Thus, for example, in an automotive control sys- 
tem, a master vocabulary might contain only a set of 
words indicating various areas of control functions, 
such as "radio", "wipers", "engine", "computer", etc. 20 
After any one of these function areas have been se- 
lected, a new localized set of reference templates would 
then be used for each particular function area. Each 
localized set of reference templates would have to in- 
clude one command to return to the master template set, 25 
but otherwise could be fully customized. Thus, a local- 
ized set of commands for radio control could include 
such commands as W, "AM", "higher", "lower", 
"frequency", "volume", etc. 

Thus, it is a further object of the present invention to 30 
provide a speaker-independent recognizer which func- 
tions on a limited vocabulary, but in which the vocabu- 
lary set can be easily changed. 

It is a further object of the present invention to pro- 
vide a speaker-independent recognizer which functions 35 
on a limited vocabulary, but in which the vocabulary 
set can be easily changed, which can be implemented 
using simple commercially available microcomputer 
parts. 

A further desirable option in speaker-independent 40 
word recognizer systems is the capability to function in 
a speaker dependent mode. That is, in such applications 
as automobile controls or speech-controlled calculators, 
it is necessary that the systems be shipped from the 
factory with a capability to immediately receive speech 45 
input. However, many such devices will typically be 
used only by a limited set of users. Thus, it is desirable 
to be able to adapt the template set of speaker-independ- 
ent device to be optimized for a particular user or group 
of users. Such re-optimization could be used to increase 50 
the vocabulary size or lower the error rate in service, 
but requires that the process of modifying templates be 
reasonably simple. 

Thus, it is a further object of the present invention to 
provide a speaker-independent word recognizer which 55 
can be re-optimized easily to operate in a speaker depen- 
dent mode, for a specific speaker or for a limited group 
of speakers. 

Thus, it is a further object of the present invention to 
provide a speaker-independent word recognizer, which 60 
can be easily re-optimized to operate in a speaker depen- 
dent mode for a specific speaker or for a limited group 
of speakers, and which can be economically configured 
using a simple microcomputer and simple analog parts. 

Speaker-independent word recognition is performed, 65 
based on a small acoustically distinct vocabulary, with 
minimal hardware requirements. After a simple precon- 
ditioning filter, the zero crossing intervals of the input 
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speech are measured and sorted by duration, to provide 
a rough measure of the frequency distribution within 
each input frame. The distribution of zero crossing 
intervals is transformed into a binary feature vector, 
which is compared with each reference template using 
a modified Hamming distance measure. A dynamic time 
warping algorithm is used to permit recognition of vari- 
ous speaker rates, and to economize on the reference 
template storage requirements. A mask vector for each 
reference template is used to ignore insignificant (speak- 
er-dependent) features of the words detected. 

To achieve these and other objects of the invention, 
the present invention comprises: 
A word recognizer, comprising: 
input means for receiving an analog input signal cor- 
responding to speech; 
a signal processor, said processor conditioning said 
input signal according to a predetermined charac- 
teristic and measuring zero crossing intervals of 
said conditioned signal to provide a binary feature 
vector; 

distance measurement means, said distance measure- 
ment means comparing said binary feature vector 
with each of a plurality of binary reference vectors 
(said reference vectors being organized in sequen- 
ces corresponding to words) to provide a distance 
measure at least partially corresponding to a Ham- 
ming distance measure with one of said feature 
vectors; and 

recognition means for recognizing words in accor- 
dance with the sequence of said distance measures 
between each said sequence of said reference vec- 
tors and successively received ones of said feature 
vectors. 

According to a further embodiment of the present 
invention, the present invention comprises: 

A word recognizer, comprising: 

input means for receiving an analog input signal cor- 
responding to speech; 

a signal processor, said processor conditioning said 
input signal according to a predetermined charac- 
teristic and measuring zero crossing intervals of 
said conditioned signal to provide a feature vector; 

distance measurement means, said distance measure- 
ment means comparing said feature vector with 
each of a plurality of reference vectors (said refer- 
ence vectors being organized in sequences corre- 
sponding to words) to provide a distance measure 
at least partially corresponding to a Hamming dis- 
tance measure with respect to said reference vec- 
tors for each successive one of said feature vectors; 

recognition means for recognizing words in accor- 
dance with the sequence of said distance measures 
between each said sequence of said reference vec- 
tors and successively received ones of said refer- 
ence vectors and successively received ones of said 
feature vectors, said recognizer also performing a 
dynamic programming step to provide an optimal 
subsequence match between successively received 
ones of said feature vectors and said sequences of 
reference vectors. 

BRIEF DESCRIPTION OF THE DRAWINGS 

The present invention will be described with refer- 
ence to the accompanying drawings, wherein: 

FIG. la shows a block diagram of the word recog- 
nizer of the present invention; 
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FIG. lb is a graphical representation of an analog ing to enable electronic word recognition of the analog 
speech signal with respect to time after initial condition- input speech signal 10 to take place. Word recognition 
ing thereof, but prior to zero-crossing detection in the is generally indicated by the declaration output 25 from 
word recognizer of FIG. la; the microcomputer 15 as determined by the decision 

_ FIG. 1c is a schematic representation of the speech 5 logic thereof. In the latter respect, referring to FIG. 2, 
signal of FIG, 1& following its subjection to zero-cross- the signal processing unit IS is illustrated with dashed 
ing detection, in the word recognizer of FIG. la; lines and includes a feature extractor 14 which receives 

FIG. 2 shows a block diagram of the preferred hard- the conditioned speech signal as an input. The feature 
ware implementation of the word recognizer according extractor 14, in the presently preferred embodiment, 
to the present invention; 10 simply measures the intervals between zero crossings of 

FIG. 3 is a schematic diagram indicative of the end of the digital wave form received from the signal condi- 
word window operation used to identify word endings tioner 12, 13, and then simply sorts the various zero 
in the preferred embodiment of the present invention; crossing interval measurements received during any 
FIG. 4 shows a schematic indication of the process- one frame into bins, to provide an integer feature vector 
ing of raw speaker inputs to achieve the mask vector for 15 which gives a rough measurement of frequency distri- 
a reference template set, according to an empirical una- bution during that frame. The elements of the integer 
nimity factor; and feature vector are then compared with various thresh- 

FIG. 5 shows an example of the classification of a olds, to provide a binary feature vector. This provides 
speech input according to its acoustic segmentation. the basic feature measurement. Note that no digital-to- 
DESCRIPTION OF THE PREFERRED 2» anatog conversion is required. 

EMBODIMENTS distance measurement 16 then compares the 

feature vector provided by the feature extractor 14 as 
The present invention includes several points of nov- an output along the line 19 with the feature vector re- 
eky, and also can be implemented in numerous different ceived from a template storage portion 18. It is impor- 
ways. Thus, the following description will suggest a 25 tant to note two key features of the invention at this 
number of modifications and variations of the present time. First, the storage 18 contains not only a feature 
invention, without thereby implying that the present vector for each template but also a mask vector. The 
invention is limited to any specific embodiments mask vector is used to introduce don't care weightings 
thereof. into the feature vector stored with each reference tem- 

FIG. la shows generally the organization of the oper- 30 plate, as will be discussed below. Thus, the set of fea- 
ations used in the word recognizer of the present inven- tures of an input frame on which comparison for recog- 
tion. That is, a raw speech waveform 10 in the form of nition is performed is selected, and can vary for each 
an analog speech signal is first subjected to signal condi- frame of each word template. Note that the various 
tioning including extremely simple prefiltering opera- word templates stored in storage 18 each comprise a 
tions, e.g. to reject out of band signals, and wherein a 35 sequence of frames. That is, in a typical case each word 
pre-ampiifier and an analog differentiator 12 may fur- template might comprise a sequence of 8-12 frames, 
ther act upon the analog speech signal 10, The speech* (Each of the reference frames is expected to correspond 
signal may then generally take the form of the wave- to 2 of the 20 millisecond input frames, but can be 
form illustrated in FIG. 1& In the latter respect, it will warped to correspond to only one of the frames or to as 
be observed from FIG. lb that the analog signal is gen- 40 many as 4 of the input frames, as described below.) The 
erally sinusoidal and traces an undulating path above time alignment operation 20 selects the best match of 
and below a "zero" polarity axis during the entire time each of the reference templates to the current input 
duration including the respective time instants ti, t 2 , t3, frame sequence, and provides running measurements of 
and t4. The time instants t2, t3, and U identify the zero- the raw match as output, a word end detector 22 also 
crossings of the waveform illustrated in FIG. lb in 45 provides along line 23 the input to high level decision 
which a transition occurs in the polarity sign of the logic 24, and the word end measurement, together with 
waveform. Thus, the time instant t 2 identifies the zero- the running cumulative word fit provided by the time 
crossing of the waveform as it moves from a "plus" alignment block 20, provide the basis for the high level 
polarity to a "minus" polarity; t 3 identifies the zero- decision logic 24 to make the word recognition decision 
crossing of the waveform as it moves from a "minus" 50 which is provided as a declaration output 25. 
polarity to a "plus" polarity; and U identifies the zero- The operation of these various components of the 
crossing of the waveform as it moves from a "plus** invention will now be discussed in greater detail. The 
polarity to a "minus" polarity. speech signal 10, which is typically raw analog input 

Signal conditioning of the analog speech signal con- from a microphone (and typically a preamplifier) is 
tmues by monitoring the pre-amplified and differenti- 55 provided to a signal conditioner 12, 13. 
ated speech signal with a zero-crossing detector 13 to The filter functions preferably performed by the sig- 
sense each zero-crossing of the speech signal. The zero- nal conditioner 12, 13 include only an extremely simple 
crossing detector 13 duly counts each polarity transition filtering operation. In the presently preferred embodi- 
m the speech signal and assigns a time instant when each ment, the signal conditioner 12, 13 comprises a low pass 
such zero-crossing occurs. FIG. 1c schematically repre- 60 filter with a corner frequency of 6.25 KHz to reject out 
sents the speech signal of FIG. lb, by indicating the of band signals an analog differentiator, and a Schmitt 
signal polarity at each of the time instants tj, t 2 , t 3 , and trigger. The differentiator effectively emphasizes the 
U, as sensed by the zero-crossing detector 13. high frequency components in the input signal. That is, 

After the signal conditioning effected by the pre- the zero crossing characteristics of a signal can easily be 
amplifier and analog differentiator 12, and the zero- 65 dominated by a strong low frequency component, and 
crossing detector 13, the conditioned speech signal is the use of the first derivative as the function on which 
exposed to a signal processing unit 15, which may take zero crossing analysis is performed minimizes this prob- 
the form of a microcomputer, for further signal process- lem. 
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It should be noted that the filtering functions are not 
necessarily so minimal. In particular, the zero crossing 
characteristics of a speech signal are highly sensitive to 
the frequency preeraphasis and also to the phase shifting 
introduced by a prefiltering section, and a wide variety 
of prefiltering characteristics may optionally be used, 
e,g., to provide a more critical distinction between the 
words in a given vocabulary set or to reject particular 
noise characteristics. That is, the prefiltering character- 
istics will substantially affect the extent to which per- 
ceptually distinct input frames are measurably distinct 
in the very limited information provided by the recogni- 
tion algorithm of the present invention. A wide variety 
of such filtering characteristics could be introduced in 
modifications and variations of the present invention, 
but the principal embodiment of the present invention 
uses only simple processing functions as noted. 

In addition, bandpass filtering can also be used in the 
signal conditioner, to reject out of band signals, al- 
though this is not used in the presently preferred em- 
bodiment. 

It should be noted that the Schmitt trigger performs 
a rather important signal processing function, namely 
center-clipping. That is, where zero crossings to a func- 
tion including noise are being measured, even a very 
low noise power, at moments when the function value is 
near zero, can introduce many spurious zero crossings. 
To avoid this problem in recognition, center-clipping 
(using the hysteresis characteristics of the Schmitt trig- 
ger) in effect ignores zero crossings unless the wave- 
form reaches a certain minimum value between two 
adjacent zero crossings. Although a Schmitt trigger is 
not the only way to accomplish this center-clipping, 
some such function in the signal conditioner is highly 
desirable, since it greatly reduces the noise in the low- 
interval zero crossings. 

The actual zero crossing information can be obtained 
in a variety of ways, as is obvious to those skilled in the 
art. For example, the analog input signal can be applied 
to the Schmitt trigger mentioned, or to a polarity sens- 
ing saturated output amplifier, to provide a strongly 
clipped signal, i.e., a sequence of rectangular wave- 
forms of alternating sign. These waveforms can then be 
converted to logic levels and provided as inputs to a 
microcomputer which counts the duration of each rect- 
angular waveform portion (that is, the duration of each 
interval between zero crossings) in terms of clock cy- 
cles of the microcomputer. Of course, this function 
could easily be embodied in SSI logic, with a flip-flop 
and counters, or otherwise, but the embodiment in a 
microprocessor or microcomputer is preferred. The 
clock resolution of the microprocessor is preferably 
plus or minus 40 microseconds or less, but most com- 
mercial microprocessors can meet this. For example, an 
8080, a Z-80, or a TMS 7000 would all be suitable. 

The next step in processing the speech signal is to 
generate counts of the zero-crossing interval distribu- 
tion in each frame of a sequence of frames, spaced at a 
frame period. In the presently preferred embodiment, 
the frame period is 20 msec, but this frame period can 
easily be varied. If a longer frame period is used, rapid 
speech may not be well recognized, but this may be an 
acceptbale tradeoff in some applications for the sake of 
lower processor load. Similarly, a shorter frame period 
imposes a higher processor load, but provides a rela- 
tively slight gain in performance. Thus, frame periods in 
the range of 1 to 200 msec are within the scope of the 
present invention. 
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It should be noted that the input is not necessarily 
even divided into frames prior to this stage. That is, an 
advantage of using a microprocessor to measure the 
zero crossing intervals is that the microprocessor can at 

5 the same time impose the initial division of the analog 
input signal into frames. 

At each frame, a feature vector is generated from the 
input as follows: first, the RMS energy of the analog 
signal is measured over an interval which need not 

10 exactly coincide with the frame period. For example, in 
the presently preferred embodiment, the energy is mea- 
sured over an analysis window of 30 msec. This pro- 
vides some smoothing of the energy values between 
frames, and precludes missing short high-energy events. 

IS In addition, the zero crossing intervals are classified at 
this time. Again, the analysis window over which the 
characteristics of the zero crossings are measured need 
not be exactly the same as the frame period, and is 30 
msec in the presently preferred embodiment. 

20 Thus, to generate the feature vector, the zero cross- 
ing intervals of the analog waveform over a 30 msec 
interval are examined. The presently preferred method 
of extracting a feature vector from the multiple zero 
crossing interval numbers is as follows, but of course a 

25 wide range of other expedients could be used to provide 
a feature vector representative of the distribution of 
zero crossing intervals. In the presently preferred em- 
bodiment, the zero crossing intervals within the 30 msec 
analysis waveform are sorted into four "bins", where 

30 each bin generally corresponds to a bandpass filter. 
That is, bin 1 counts the number of zero crossing inter- 
vals within the analysis window which have durations 
between seven and 13 samples (one sample is equal to 80 
microseconds in this embodiment); bin 2 counts the 

35 number of zero crossing intervals in the analysis win- 
dow with a duration between four and six intervals; bin 
3 counts the number of zero crossing intervals in the 
analysis window with a duration of two or three sam- 
ples; and bin 4 counts the number of zero crossing inter- 

40 vals in the analysis window which have a duration of 
one sample. These numbers are preferably accumulated 
by the microcomputer as the clipped rectangular wave- 
form is received, so that the actual durations of the 
various zero crossings need not be stored at any point. 

45 That is, when the clipped input waveform changes sign, 
the microcomputer preferably notes the number of 
clock pulses since the Jast change of sign, increments the 
count in the appropriate bin by one, and resets its count 
register and begins to count the number of clock pulses 

50 until the next zero crossing. Thus, the number of zero 
crossings counted in any one "bin" corresponds gener- 
ally to the energy which would have been measured 
through a corresponding bandpass filter, and the distri- 
bution over all of the bins provides an analog feature 

55 vector, which in the presently preferred embodiment 
includes four analog numbers. 

Next, this integer feature vector is converted to a 
binary feature vector as follows. For example, the count 
found in bin 3 is compared to two thresholds to generate 

60 elements five and six of the binary feature vector; if the 
count is greater than a threshold B3L, then element 5 of 
the binary feature vector is set at 1 (and otherwise re- 
mains at zero); if the count in bin 3 is less than a second 
threshold B3U, then a 1 is entered in element 6 of the 

65 binary feature vector, which also otherwise remains at 
zero). That is, each bin has lower and upper thresholds, 
which are empirically chosen to maximize the discrimi- 
nation between words used. In the presently preferred 
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embodiment, the eight thresholds used, expressed in 
number of sample values, are: 



Bin number 


Lower threshold 


Upper Threshold 


I 


1 


4 


2 


4 


8 


3 


8 


16 


4 


16 


32 



5 



Again, it should be noted that the presently opera- 
tional embodiment has used very frequent high density 
resolution sampling as an initial step, and hence the zero 
crossing intervals are expressed in samples, but the con- 
templated best mode of the present invention would not 
use such expensive high-rate high-resolution sampling, 
and would use analog stages initially instead, as dis- 
cussed above. 

Thus, the foregoing process has produced a feature 
vector (eight bits in the presently preferred embodi- 
ment) for each frame of the input signal. This feature 20 
vector is compared with various reference vectors ac- 
cording to a distance measure, and word recognition is 
made in accordance with the sequence of distance mea- 
sures between the sequence of reference vectors in a 
word template and all or some of the sequence of input 25 
feature vectors received. 

The distance measure used is essentially a Hamming 
distance measure between the input frame and any par- 
ticular reference frame, but there is one important addi- 
tional point of novelty in the distance measure. A mask ^ 
vector is preferably stored along with each reference 
vector, to mask the less significant elements of the refer- 
ence vectors. Thus, the actual template for a word con- 
sists of a sequence of pairs of binary vectors: for each 
reference frame in the word template, both a reference 35 
feature vector and a mask vector are stored. For exam- 
ple, if the fourth element of a mask vector is 1, then the 
fourth element of the corresponding reference vector is 
used in the distance computation. If the mask vector 
element is zero, the corresponding element of the asso- 40 
ciated reference vector is not used in the distance com- 
putation. Thus, the distance between the test feature 
vector TF, and the i-th reference vector RF(i) and mask 
vector RM(i) is defined by the following logical opera- 
tion: 45 

E>TF,i~ Hamming (RM(i). and . (TExor.RF(i))). 
Thus, Dtfj is the Hamming distance between the test 
vector TF and the i vector set of the template for the 
given word. This distance indicates the number of simi- 50 
larities between test feature vector TF and reference 
vector RF(i), with masking defined by the zero valued 
elements in the mask vector RM(i). 

It should be recognized that this use of a mask vector 
to exclude insignificant components of each reference 55 
vector is broadly novel, and may be modified and var- 
ied widely. For example, it is not strictly necessary that 
the feature and reference vectors be binary, since a 
binary masking value may be used to mask the results of 
an analog subtraction step as well In fact, it is not even 60 
stricdy necessary that the mask vector itself be binary, 
although this is greatly preferable. If the mask vector is 
allowed to take on analog values, then it functions es- 
sentially as a weight vector. A weighting vector is still 
useful for disregarding insignificant bits in a recognition 65 
comparison, but an analog weighting vector does not 
offer nearly the computational efficiency which is pro- 
vided by a binary mask vector. Moreover, preparations 
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of a binary mask vector for a given word recognition set 
can be performed very simply and efficiently, as will be 
described below. 

In addition, it should be noted that the novelty in the 
use of a masking vector is not by any means limited to 
use of an eight-bit feature vector, nor to recognition 
applications where the essential feature vector extrac- 
tion step is based on zero crossing intervals, but can be 
applied to any speech recognition system whatsoever. 

The method by which the reference vectors in a 
word template are generated will now be described. 

To construct a template, the starting point is a large 
number of independent samples of that word as pro- 
nounced by a population which is sufficiently diverse to 
approximate that whose speech the recognizer will be 
required to recognize. For example, if the speech recog- 
nizer in use will be exposed to spoken inputs from men, 
women and children having all of the common regional 
accents, then the initial data base should also be ob- 
tained from men, women, and children, and should also 
include as many as possible of the regional accent types 
which must be recognized in use. 

Correspondingly, if the recognizer is to be operated 
in a speaker-dependent mode or is to be customized for 
a small group of speakers, the number of samples must 
remain large, but the speakers within the relevant set 
will be proportionately represented. For example, if 
four speakers are to be recognized, each should contrib- 
ute an average of 25 samples to the data base. 

The first step is a manual classification step. Suppose 
that a template is to be constructed for the word "stop". 
This word has six distinct acoustic segments as shown in 
the spectrogram of FIG. 5. These are the initial sibi- 
lant/s/, stop and release/t/, vowel portion/A/, and the 
final stop and release/p/. These six segments are prefer- 
ably marked interactively, with the aid of spectrogram 
and waveform plots, on graphics terminals, for each 
sample in the data base, Thus, this step manually estab- 
lishes the location of corresponding acoustic segments 
within the data base sampled. This step is used because 
various speakers will vary the relative length of differ- 
ent acoustic segments within a word, and it is necessary, 
when estimating from the data sample what feature 
vector would correspond to a vowel/A/, that the com- 
putation not be distorted by time-warped samples in 
which the speaker was actually pronouncing a/t/ or 
a/p/. Of course, this time-alignment of the samples in 
the data base could be accomplished other ways, includ- 
ing automatic classification of the samples by acoustic 
segment boundaries according to, e.g., LPC character- 
istics, but the presently preferred embodiment uses 
manual classification at this step. 

Thus, after this manual classification step, the seg- 
ment within each sample in the data base which corre- 
sponds to the successive acoustic segments which must 
be included in the reference vector has been established. 
The average duration of each of the acoustic segments 
is then computed to establish the number of reference 
frames needed in each segment. For example, suppose 
the sibilant /s/ has an average duration of 130 msec. 
Then, at a reference frame period of 40 msec, three 
reference frames in the word template will correspond 
to the sibilant /s/. (The period of the reference frame in 
the presently preferred embodiment is exactly twice as 
long as the frame interval in the input speech, for rea- 
sons which will be discussed below.) 
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The next step is to locate, in each of the 100 samples, approach is that the storage requirements are reduced, 
which portions of the sample shall be included in the since the reference frame interval is twice the frame 
computation of the expected characteristics of each of interval imposed on the speech input signal. In general, 
the three reference frames in the word template which the unconstrained end . p oint approach is accomplished 
are to correspond to the sibtlant /s/. That is, m this 5 by providin g a cumu | ativ e cost profile, for each point in 
example the three /s/ reference feature vectors should ^ wWc h that the current input frame is the 
be computed based on measurements at three points . - r 
evenly spaced within the duration of the phoneme /s/, ] f frame - However, to economize on processor time, 
for each sample in the data base. Thus, the result of this the Preferred embodiment uses an end-of-word window 
process is that, for each frame in the word template for 10 mstead > 38 Wl!1 be discussed below, 
which a reference vector must be computed, a unique Thus, the foregoing steps produce a scaler dissimilar- 
location within each sample in the data base to corre- i*y measure Dnj which shows the dissimilarity between 
spond to that frame has been identified. an input frame j and a reference frame N. This dissimi- 

By way of example, FIG. 4 generally illustrates the Iarity measure is then transformed, through a dynamic 

process by which each reference vector in the word 15 programming procedure, into a minimal subsequence 

template is computed, based on the corresponding por- distance measure (scanning error) Enj, which is prefer- 

tions of the various samples in the data base. First, a a fciy defined as follows: 
tolerance number called a unanimity factor (nu) is 

chosen empirically. In the presently preferred embodi- ENj-£>Nj+Tam{E N ^ij^i+K £)v-w-2» 

ment nu is set equal to 0.93, but may be greater or lesser 20 E^u^+K/}, En~ \j-a+k} 

depending on the uniformity of the speakers in the data 

base, and to some extent on the size of the data base. The quantity "K" is a constant which is optionally 
However, in the presently preferred embodiment, a used to impose a warping penalty. That is, the expected 
value greater than 90% is preferably used, and is prefer- ratio of reference frames to sample frames is one refer- 
ably in the range of 90 to 97%. 25 ence f rame to ever y two sample frames. However, if 

The imanimity factor nu tells how many disagree- this h not m fact the actual acingj then a h 

ments can be tolerated on any particular bit of the fea- amount is adM tQ the mfaim8 , su5sequence dis tance 

turevector forcorrespondmgf^ fof refefence . wMch ^ ^ ^ rf . 

that there is no concurrence of behavior over the popu- - / r - - . , / 

lation. That is, for example, suppose that nu is chosen 30 to ^T^Tu " f T « L , 

equal to 0.93. In this case, if 93 or more of 100 samples that ' he P* 81 * added where «*> "J 00 *"* w 18 

in the data base have a value for the first analog parame- much smalIer tnan tnat imposed where the ratio is lo- 

ter (at corresponding frame locations) which is larger cau V 4-1 or UL oniv a modest penalty is added 

than B1L, then the first elements of the reference fea- wnere in P ut speech is slightly slower than the refer- 

ture vector and of the mask vector are set equal to 1. If 35 ^ce speech rate (down to li times as slow), but a sub- 

93 or more of the samples have a value for the first stantiaily larger penalty is added if the input speech is 

parameter which is below B1L, then the first element of faster than the reference speech, or is more than l| 

the reference feature vector is 0 and the first element of times as slow as the rate affected by the reference 

the mask vector is 1. That is, in this case the population speech. 

would have agreed that the general behavior is to have 40 That is, where input frames are matched to reference 

the number of zero crossing intervals in the first "bin" frames at ^ average rate which is between 2A and 34j 

less than the threshold B1L. However if less than 93 md where tbe ^ distribution 0 f the input frame is the 

IS&ST? ? ? reSP i e ?' th€n th *J™i &l ™ ent of same as that of the refrence frame, then the particular 
the mask vector is set equal to zero, and the first element . - * - * > ? 

of the reference vector is a "don't care" value and can 45 *f ppH)gS f r ,? ference on£o u mpUt frame 

be 0 or 1. tne °P nma i subsequence will vary between every other 

Thus, this process generates a word template which is m P ut fnmie md ever y tfaird in P ut frame * ^ the £otaI 

a time ordered sequence of vector pairs, namely a fea- speed-mismatch penalty will be a linear function of the 

ture vector and a mask vector at each reference frame speech rate mismatch. However, where the warping of 

interval. 50 the input sample is sufficiently nonlinear that, within the 

The basic distance measure, which compares one optimal subsequence, some adjacent pairs of the refer- 

frame of speech input to some one frame of a word ence template sequence match either adjacent input 

template has been described above. However, the word frames or to input frames which are separated by three 

identification rests not merely on testing the identity of other input frames, an additional penalty will be added 

frame to frame, but rather on finding the similarity of a 55 t o the smooth penalty for linear warping. This addi- 

sequence of input frames to the sequence of reference tiona3 penaity may be referred t0 w a non i inear warping 

IS m \ W °i tCn ? P n thC pMC fy P ^ ferre ^ ?™&y> * should be noted that nonlinear 

fclm^ 11 ^ » ?™*™ d "** causes some local portion 

to tmd an optimal subsequence match between the se- , ? „ . / «. , , X . ~ 

quence of reference vectors in the word template and a 60 ° f the fefe TVTT "P 9 "* den ? er . than " 

subsequence of feature vectors in the speech input. This or sparser than U3 ' ^ tfus warpm S P enaltv "wofpo- 

dynamic programming algorithm permits time-warping rates s P eech - rate information into the recognition pro- 

by various speakers to be accommodated, but also has cess ' but does , not xec l™** lar ge additional amounts of 

other advantages. In particular, the end points can be computation time. 

unconstrained. That is, no separate decision step needs 65 The warping penalty is optional, and is not strictly 

to be made as to which frame in an input sequence of necessary for practicing the present invention. That is, 

frames the end point of the word template should be the iterative statement of the dynamic programming 

identified to. Moreover, a second advantage of this measure can be restated as 
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The presently preferred embodiment does not use 
warping penalties to minimize the computational load. 

Alternatively, a larger than 2-to-l warping factor can 
be permitted, or a sparser ratio of reference templates to 
input frames could be used, as desired. The warping 
penalties can accordingly be widely varied. 

The foregoing dynamic programming procedure can 
provide a cumulative fit measure for each word in the 
vocabulary, at each input frame interval. In this case, 
the recognizer is capable of operating in a connected- 
speech recognition mode rather than an isolated-speech 
recognition mode. 

However, this imposes a heavy additional processing 
load and is not the preferred embodiment of the inven- 
tion. 

That is, the processing load required to find a cumula- 
tive optimal subsequence match at each input frame 
interval is too much for the economical implementa- 
tions at which the present invention is especially di- 
rected. To reduce this processing load, words are pref- 
erably recognized only at word ending points identified 
by an end-of-word detector. The operation of the end- 
of-word detector will now be described with reference 
to FIG. 3. The end-of-word operation as depicted in 
FIG. 3 provides the integer feature vector 32 for each 30 
frame of input speech as an input to a plural frame 
buffer memory 34, which may include storage for 20 
frames of speech data for example. 

In the presently preferred embodiment, the zero 
crossings are not only sorted into bins, but a count is 35 
kept of the total number of zero crossings. For example, 
this can be done by adding together the counts in the 
various bins of the integer feature vector, depending on 
the bin threshold values. Alternatively, this can be done 
by simply keeping a direct running count of the number 40 
of zero crossings, and holding this as a directly com- 
puted parameter during each frame of input speech. A 
further alternative is simply to count the number of 
high-frequency zero crossings for each frame, and sum 
those across frames as at 36. 45 

The key test which is implemented in the end-of- 
word decision of this aspect of the present invention is 
to ascertain whether the number of zero crossings ex- 
ceeds a given threshold number as at 38 during a reason- 
ably long period of time (e.g. 300 milliseconds)* If the 50 
threshold number is exceeded as at 40, this large number 
of zero crossings indicates that no low-frequency en- 
ergy, and therefore presumably no speech, is present 
during this 300 millisecond window. It should be noted 
that this is somewhat sensitive to the bias level used in 55 
the Schmitt trigger (or other center-clipping mecha- 
nism). That is, if the bias level in the Schmitt trigger is 
set too high, then noise at the end of a word, in a quiet 
environment, will not be able to produce the high num- 
ber of zero crossings required for the detection of end of 60 
word. Correspondingly, if the bias level is too low, a 
long unvoiced consonant (such as the s at the end of a 
word such as "guess") may generate enough high-fre- 
quency zero crossings to trigger the end of word detec- 
tor erroneously. 

Thus, the end-of-word detector selectively indicates 
that an end-of-word has occurred. If so, then word 
recognition is performed on the assumption that the 
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word will have ended during a second window period, 
which is not necessarily the same window over which 
the end-of-word operates. That is, in the presently pre- 
ferred embodiment, an end-of-word is detected when 
300 milliseconds have occurred without input speech 
energy, and the first 200 milliseconds of the 300 milli- 
second end-of-word detection window are then 
searched for a hypothetical word ending point. How- 
ever, this second window, during which an end-of- 
word is looked for, can be the same as or different from 
the end-of-word detection window, and can be varied 
within very broad parameters. In the example shown in 
FIG. 3 the end-of-word detection lies within the first 13 
frames of speech data included in the 20-frame buffer 
memory 34. The essential trade-off here is that, if the 
recognition window is made smaller, the processor load 
is reduced but the frequency of non-recognition errors 
is likely to be increased. 

The invention as presently practiced is embodied in a 
VAX 1 1/780, with analog input and output connections 
(Le., microphone, preamplifier, analog-to-digital con- 
verter, digital-to-analog converter, audio amplifier and 
loudspeaker), and is implemented in the Fortran code in 
the attached appendix which is hereby incorporated by 
reference. However, as discussed above, the present 
invention can be implemented in a cheap micro-com- 
puter system, and the contemplated best modes of the 
invention in the future are expected to be microproces- 
sor or microcomputer embodiments. 

In particular, an embodiment of the present invention 
in an 8-bit microprocessor system is believed to be 
straight-forward. No expensive data converter chip or 
means for energy measurement is required. The only 
analog stages needed, in the preferred embodiment are 
the low-pass filter, differentiator, and Schmitt trigger. If 
the present invention is embodied in a 16-bit system, the 
additional processing power and word length will mean 
simply that a slightly larger vocabulary can be accom- 
modated, and will also make development of the vocab- 
ulary templates slightly easier. 

As will be obvious to those skilled in the art, the 
present invention provides numerous broad points of 
novelty over the prior art of speech recognition. There- 
fore, the scope of the present invention can be embodied 
in numerous modifications and variations and is not 
limited as specified in the accompanying claims. 

What is claimed is: 

1. A method for recognizing speech independent of 
the speaker thereof, said method comprising: 

receiving an analog input speech signal; 

conditioning said analog speech signal to produce a 
sequence of rectangular waveforms of polarity 
signs alternating between plus and minus polarities 
as a digital waveform signal; 

counting the number of polarity transitions in the 
digital waveform signal to obtain a zero-crossing 
count for each frame of the digital waveform sig- 
nal; 

measuring the time duration intervals between zero- 
crossings of the digital waveform signal; 

providing a sequence of binary feature vectors based 
upon the measurements of the time duration inter- 
vals between zero-crossings of the digital wave- 
form signal and corresponding to respective frames 
of the digital waveform signal; 

providing a vocabulary consisting of a relatively 
small number of words, wherein each of the words 
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included in the vocabulary is represented by a plu- 
rality of binary reference vectors which have been 
organized in sequences with each of said binary 
reference vector sequences corresponding to a 
word acoustically distinct from the other words 
included in the vocabulary; 
comparing each of said binary feature vectors with 
each of said plurality of binary reference vectors; 
determining a distance measure with respect to each 
of said binary reference vectors for each successive 
binary feature vector in said sequence of binary 
feature vectors in response to the comparison 
therebetween; and 
recognizing words in accordance with the distance 
measures between each of said binary reference 
vector sequences and successively received binary 
feature vectors corresponding to respective frames 
of the digital waveform signal. 
2. A method for recognizing speech as set forth in 
claim 1, wherein the provision of said sequence of bi- 
nary feature vectors is accomplished by sorting the 
zero-crossing time duration interval measurements re- 
ceived during respective frames of the digital waveform 
signal into corresponding ones of a plurality of bins 
respectively representative of different time duration 
intervals between zero-crossings; 
counting the number of zero-crossing time duration 
intervals for each of the plurality of bins; 
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sequences of said binary reference vectors and spaced 
successive ones of said binary feature vectors. 

9. The method of claim 5 T wherein said recognizing 
step comprises a dynamic programming step to achieve 
an optimal subsequence match between one of said 
sequences of said binary reference vectors and spaced 
successive ones of said binary feature vectors. 

10. The method of ciaim 1, wherein said conditioning 
step includes center clipping said analog input speech 
signal. 

11. The method of claim 3, wherein said conditioning 
step includes center clipping said analog input speech 



12. The method of claim 10, wherein said center 
clipping step is performed by a Schmitt trigger. 

13. The method of claim 11, wherein said center 
clipping step is performed by a Schmitt trigger. 

14. The method of claim 1, wherein said conditioning 
step includes the performance of an operation corre- 
sponding to differentiation of said analog input speech 
signal. 

15. The method of claim 3, wherein said conditioning 
step includes the performance of an operation corre- 
sponding to differentiation of said analog input speech 
signal. 

16. A word recognition system for identifying a spo- 
ken word independent of the speaker thereof, wherein 
the spoken word is represented by an analog speech 



comparing the counts of respective bins to upper and 30 signal, said word recognition system comprising: 
i c 4-t,^^^i^o ^rfMnnnHino trt tVip fiifftial conditioning means for receiving an ; 



lower reference thresholds corresponding to the 
respective bins; and 
providing said sequence of binary feature vectors in 
response to the comparison between the counts of 
the respective bins and the upper and lower thresh- 35 
olds corresponding thereto. 

3. A method for recognizing speech as set forth in 
claim 1, further including 

establishing the identity of an end of word prior to 
the recognition of a word as a precondition thereto, 40 
the establishing of said end of word identification 
including: 

monitoring the zero-crossing count for the digital 
waveform signal, and 

declaring an end of word condition whenever the 45 
average frequency of said zero-crossings exceeds 
an end point target zero-crossing frequency for a 
time duration longer than a predetermined refer- 
ence time duration. 

4. The method of ciaim 1, wherein said distance meas- 50 
tire-determining step comprises a Hamming distance 
measurement. 

5. The method of claim 3, wherein said distance meas- 
ure-determining step comprises a Hamming distance 
measurement. 55 

6. The method of claim 1, wherein said recognizing 
step comprises a dynamic programming step to achieve 
an optimal subsequence match between one of said 
sequences of said binary reference vectors and spaced 
successive ones of said binary feature vectors. 60 

7. The method of claim 3, wherein said recognizing 
step comprises a dynamic programming step to achieve 
an optimal subsequence match between one of said 
sequences of said binary reference vectors and spaced 
successive ones of said binary feature vectors. 65 

8. The method of claim 4, wherein said recognizing 
step comprises a dynamic programming step to achieve 
an optimal subsequence match between one of said 



signal conditioning means for receiving an analog 
input speech signal and producing a digital wave- 
form signal as a sequence of rectangular wave- 
forms of polarity signs alternating between plus 
and minus polarities, said signal conditioning 
means including a zero-crossing detector for count- 
ing the number of polarity transitions in the digital 
waveform signal to obtain a zero-crossing count 
for each frame of the digital waveform signal; 
memory means storing a plurality of binary reference 
templates of digital speech data respectively repre- 
sentative of individual words and comprising the 
vocabulary of the word recognition system, the 
vocabulary consisting of a relatively small number 
of words with each of the words included in the 
vocabulary being represented by a binary reference 
template defined by a predetermined plurality of 
binary reference vectors arranged in a predeter- 
mined sequence and comprising an acoustic de- 
scription of an individual word in a time-ordered 
sequence, each of said binary reference templates 
corresponding to a word acoustically distinct from 
the other words included in the vocabulary; 
means operably coupled to said signal conditioning 
means for extracting binary feature vectors from 
said digital waveform signal based upon the time 
duration intervals between zero-crossings of the 
digital waveform signal; 
means operably associated with said binary feature 
vector extracting means for comparing each binary 
feature vector of said digital waveform signal with 
the corresponding binary reference vectors of each 
of said binary reference templates to provide a 
distance measure with respect to each of the binary 
feature vectors and the predetermined binary refer- 
ence vector sequences defining acoustic descrip- 
tions of the respective words included tn the vo- 
cabulary of the word recognition system; and 



word recognizing means for determining which one 
of the plurality of the binary reference templates is 
the closest match to said digital waveform signal 
representing said analog input speech signal based 
upon the distance measures of each of said binary 5 
reference vector sequences and successively re- 
ceived binary feature vectors corresponding to 
respective frames of the digital waveform signal. 

17. A word recognition system as set forth in claim 
16, further including dynamic programming means op- 10 
erably connected to the output of said comparing means 
for receiving the distance measures between each of 
said binary reference vector sequences and successively 
received binary feature vectors to provide an optimal 
subsequence match therebetween. 15 

18. A word recognition system as set forth in claim 
16, further including word-end detector means operably 



18 

interposed between said zero-crossing detector of said 
signal conditioning means and said binary feature vec- 
tor extracting means for monitoring the zero-crossing 
count for the digital waveform signal and producing a 
signal output declaring an end of word condition when- 
ever the average frequency of said zero-crossings ex- 
ceeds an end point target zero-crossing frequency for a 
time duration longer than a predetermined reference 
time duration; and 
said word recognizing means including decision logic 
means having inputs for receiving the distance 
measures of each of said binary reference vector 
sequences and successively received binary feature 
vectors and the output from said word-end detec- 
tor means as a precondition to providing a word 

recognition output 
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[57] ABSTRACT 

A telephony channel simulation process is disclosed for 
training a speech recognizer to respond to speech obtained 
from telephone systems. An input speech data set is provided 
to a speech recognition training processor, whose bandwidth 
is higher than a telephone bandwidth. The process performs 
a series of alterations to the input speech data set to obtain 
a modified speech data set. The modified speech data set 
enables the speech recognition processor to perform speech 
recognition on voice signals from a telephone system. 
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TELEPHONY CHANNEL SIMULATOR FOR 
SPEECH RECOGNITION APPLICATION 

This is a continuation of prior application Ser. No. 
07/948,031, filed Sep. 21, 1992, now abandoned. 

BACKGROUND OF THE INVENTION 

This invention relates adapting a speech recognition sys- 
tem to be capable of operating over telephonic public 
switched networks. 

Speech recognition systems are well known to the art 
Examples include the IBM Tangora ("A Maximum Likeli- 
hood Approach to Continuous Speech Recognition;" L. R. 
Bahl, F. Jelinek, R. Mercer; Readings in Speech Recogni- 
tion; Ed.: A. Waibel, K. Lee; Morgan Kaufmann, 1990; pp. 
308-319.) and Dragon Systems Dragon 30k dictation sys- 
tems. Typically, they are single user, and speaker-dependent 
This requires each speaker to train the speech recognizer 
with his or her voice patterns, during a process called 
"enrollment". The systems then maintain a profile for each 
speaker, who must identify themselves to the system in 
future recognition sessions. Typically speakers enroll via a 
local microphone in a low noise environment, speaking to 
the single machine on which the recognizer is resident. 
During the course of enrollment, the speaker will be required 
to read a lengthy set of transcripts, so that the system can 
adjust itself to the peculiarities of each particular speaker. 

Discrete dictation systems, such as the two mentioned 
above, require speakers to form each word in a halting and 
unnatural manner, pausing, between, each, word. This 
allows the speech recognizer to identify the voice pattern 
associated each individual word by using preceding, and 
following, silences to bound the words. The speech recog- 
nizer will typically have a single application for which it is 
trained, operating on the single machine, such as Office 
Correspondence in the case of the IBM Tangora System. 

Multi-user environments with speaker dependent speech 
recognizers require each speaker to undertake tedious train- 
ing of the recognizer for it to understand his or her voice 
patterns. While it has been suggested tot the templates 
which store the voice patterns may be located in a common 
database wherein the system knows which template to use 
for a speech recognition by the speaker telephone extension, 
each speaker must none-the-less train the system before use. 
A user new to the system calling from an outside telephone 
line will find this procedure to be unacceptable. Also, the 
successful telephonic speech recognizer will be capable of 
rapid context switches to allow speech related to various 
subject areas to be accurately recognized. For example, a 
system trained for general Office Correspondence will per- 
form poorly when presented with strings of digits. 

The Sphinx system, first described in the Ph.D Disserta- 
tion of Kai-Fu lie ("Large Vocabulary Speaker and Depen- 
dent Continuous Speech Recognition: The Sphinx System;" 
Kai-Fu Lee; Carnegie Mellon University, Department of 
Electrical and Computer Engineering; April 1988; CMU- 
CS-88-148), represented a major advance over previous 
speaker,dependent recognition systems in that it was both 
speaker independent, and capable of recognizing words 
from a continuous stream of conversational speech. This 
system required no individualized speaker enrollment prior 
to effective use. Some speaker dependent systems require 
speakers to be reenrolled every four to six weeks, and 
require users to carry a personalized plug-in cartridge to be 
understood by the system. Also with continuous speech 
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recognition, no pauses between words are required, thus the 
Sphinx system represents a much more user friendly 
approach to the casual user of a speech recognition system. 
This will be an essential feature of telephonic speech rec- 

5 ognition systems, since the users will have no ixaining in 
how to adjust their speech for the benefit of the recognizer. 

A speech recognition system must also offer real time 
operation with a given modest vocabulary. However, the 
Sphinx System still had some of the disadvantages of the 

10 prior speaker dependent recognizers in that it was pro- 
grammed to operate on a single machine in a low noise 
environment using a microphone and a relatively con- 
strained vocabulary. It was not designed for multi-user 
support, at least with respect to the different locations, and 

*5 multiple vocabularies for recognition. 

This invention overcomes many of the disadvantages of 
the prior art. 

20 OBJECTS OF THE INVENTION 

It is therefore an object of the present invention to provide 
a continuous speech speaker independent speech recognizer 
suitable for use with telephony equipment with input from 
speakers, both local and long distance. 
25 It is another object of the invention to train the system 
from a vocabulary gathered in low noise conditions to 
recognize speech patterns in a high noise e.g., telephone 
environment. 

30 It is another object of the invention to enable a plurality 
of voice applications to be recognized by a speech recog- 
nizer concurrently in a computer network or telephonically. 

SUMMARY OF THE INVENTION 

35 

These and other objects are accomplished by speech 
recognition systems, architected on a client/server basis on 
a local area or wide area network. The speech recognition 
system is divided into a number of modules including a front 

^ end which converts the analog or digital speech data into a 
set of Cepstrum coefficients and vector quantization values 
which represent the speech. A back end uses the vector 
quantization values and recognizes the words according to 
phoneme models and word pair grammars as well as the 

45 context in which the speech made. By dividing the vocabu- 
lary into a series of contexts, situations in which certain 
words are anticipated by the system, a much larger vocabu- 
lary can be accommodated with minimum memory. As the 
user progresses through the speech recognition task, con- 
texts are rapidly switched from a common database (see the 
Brickman, et al. Patent Application cited herein). The system 
also includes an interface between a plurality of user appli- 
cations also in the computer network. 
The system includes training modules, training and task 

55 build modules to train the system and to build the word pair 
grammars for the context respectively. 

The invention includes a telephony channel simulation 
process for training a speech recognizer to respond to speech 
obtained from telephone systems. The method begins by 

60 inputting a data set to a speech recognition training proces- 
sor, whose bandwidth is higher than a telephone bandwidth. 
Then, the speech data set is decimated to obtain a decimated 
speech data set having the telephone bandwidth. Then, a 
bandpass digital filter is applied to the decimated speech 

65 data set, which characterizes transmission characteristics of 
telephone equipment This is done to obtain a filtered speech 
data set. Then, the amplitude of the filtered speech data set 
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is rescaled, so that its maximum dynamic range matches the 
maximum range of uncompanded telephone speech. This is 
done to obtain a rescaled speech data set. Then, the rescaled 
speech data set is modified with quantization noise repre- 
senting a sequence of companding and uncompanding a 
speech signal in a telephone system. This is done to obtain 
a modified speech data set. Then, the modified speech data 
set is input to a speech recognition processor, to train 
statistical pattern matching data units. The method results in 
the speech recognition processor being able to perform 
speech recognition on voice signals from a telephone sys- 
tem. 

DESCRIPTION OF THE FIGURES 

Tnese and other objects, features and advantages will be 
more fully appreciated with reference to the accompanied 
Figures. 

FIG. 1 illustrates the logical architecture of a continuous 
speech recognition system, which includes the telephony 
channel simulator invention. 

FIG. 2 is a graph which characterizes the telephonic codec 
filter impulse response. 

FIG. 3 is a graph illustrating the magnitude response 
verses normalized radian frequency. 

FIG. 4 is a graph which illustrates the log magnitude 
response verses normalized radian frequency. 

FIG. 5 is a block diagram of a network for a recognition 
server operating in a telephony customer service call center. 

FIG. 6 is a flow diagram of a sequence of operational steps 
for a process for training a speech recognizer to respond to 
speech obtained from telephone systems. 

DETAILED DESCRIPTION OF THE 
INVENTION 
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The bandwidth reductions and noise introduced by tele- 
phone lines reduce the accuracy of all speech recognition 
systems. This effect increases with the size of the vocabulary 
that must be recognized at each moment in time. The use of 40 
rapidly switchable speech recognition contexts useful to this 
invention, so that individual contexts can be limited in size. 
Context switching is described in the copending U.S. patent 
application Ser. No. 07/947,634, filed Sep. 21, 1992, by N. 
F. Brickman, et aL, entitled "Instantaneous Context. Switch- 
ing For Speech Recognition Systems" assigned to the IBM 
Corporation and incorporated herein by reference. 

FIG. 1. illustrates the logical architecture of the IBM 
Continuous Speech Recognition System (ICSRS) indepen- 
dent of hardware configurations. At a broad level, ICSRS 
consists of components addressing the following areas: 
Data Acquisition — Data are converted in block 100 from 
analog to digital form, or potentially demultiplexed 
from other channels in the case of telephonic data. 
Data Compression — The ICSRS Front End blocks 102 
and 104, conditions, resampies, and compresses speech 
data streams to 300 bytes per second during the vector 
quantization step. 
Speech Recognition— The Back End 106 performs the 
actual speech recognition by pattern matching pho- 
neme models 192 using a grammar-guided beam search 
algorithm. The phoneme models 192 and word pair 
grammars 135 together constitute the recognition con- 
texts. Single or multiple instances of Back-End recog- 
nizers can be deployed either remotely or locally to the 65 
Front-End instances which acquire and compress the 
speech data. 
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Task Building — The task building component 130 allows 
the construction of recognition contexts off-line, com- 
piles the word pair grammars for use at run time, and 
binds appropriate phoneme models to the task (con- 
text). 

Application Program Interface— The API 108 offers RPC 
based recognition services which allow data stream 
control, context loading, and activation. 
Telephone Channel Simulator— the simulator 185 con- 
nects high bandwidth, high resolution speech data sets 
into phoneme models 192 and telephone speech, hav- 
ing the reduced sampling rate, compressed bandwidth 
and compressed dynamic range of telephone speech. 
During speech recognition, either a high bandwidth voice 
data stream from a local microphone or a low bandwidth 
voice data stream, such as would be associated with tele- 
phony, is received by the Analog to Digital Conversion 
block 100. The Analog to Digital Conversion 100 can be 
performed by a hardware card such as the IBM M-Audio 
Capture and Playback Card (M-ACPA) card in the voice 
workstation. It has a digital signal processor which pro- 
cesses either the high bandwidth or telephony bandwidth 
signals and converts them to a series of digitally sampled 
data points. This conversion could also be performed by a 
digital PBX, and the telephony data streams provided in 8 
KHz, 8-bit mu-law/a-law compressed format 

For purposes of the present invention, high bandwidth is 
defined as being a sampling rate of 16 kilohertz or above. 
Low bandwidth is defined as 8 kilohertz or below which is 
what the general telephone system in the United States uses 
for digital voice. The A/D conversion block 100 is optional 
as in a telephone system the digital information could come 
in from a private phone exchange (PBX). 

The first major block in the "front end" for speech 
recognition is the Data Conditioning and Rate Conversion 
(DCRC) block 102. The digitalized input from the A/D 
conversion 100 is at 44 or 8 kilohertz. A resampling tech- 
nique referenced to herein as decimation, is used as provided 
by the public literature in the IEEE (A General Program to 
Perform Sampling Rate Conversion of Data by Rational 
Radios;" from "Programs for Digital Signal Processing," 
Ed: Digital Signal Processing Committee of the IEEE 
Acoustics, Speech, and Signal Processing Society; IEEE 
Press, 1979; Section 8.2, pp. 8.2-1 to 8.2-7 by R. E. 
Crochiere). The DCRC 102 samples and uses anti-aliasing 
filters on the digitized signal to create either a 16 kilohertz 
or 8 kilohertz data stream, for subsequent use. Both the 
DCRC and Vector Quantization processes are described in 
greater detail below. 

After data conditioning and rate conversion in speech 
recognition, the voice data is passed to the Vector Quanti- 
zation block 104. In Vector Quantization, the digital data 
stream is segmented into Frames of one-fiftieth of a second 
duration, resulting in 320, 220, and 160 samples each at 16 
KHz, 11 KHz, and 8 KHz sampling rates respectively. In one 
preferred embodiment, there are a hundred frames per 
second computed from any bandwidth speech signal, they 
are then over-lapped by fifty-percent, and have a Hamming 
window applied. The Harnming window is well defined in 
the public literature ("Theory and Application of Digital 
Signal Processing," L. R. Rabiner, B. Gold; Prentice Hall, 
1975, pp. 91). 

After the voice data stream is broken into frames, the 
vector quantization step extracts features from each frame. 
In the extraction portion of the Vector quantization step, a 
series of parameters called the LPC Cepstrum Coefficients 
are calculated. The Cepstrum Coefficients extract and sum- 
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marize, some of the important characteristics of the speech 
for pattern recognition. In each frame of data, a fiftieth of a 
second of speech is encapsulated. One would expect to have 
fifty frames per second, however, there is fifty-percent 
overlap so a hundred frames per second are generated. To 5 
calculate the Cepstrum Coefficients, first a Hamming win- 
dow, which is a cosine bell, is applied to the voice data. A 
Harnrning window tapers the edges each frame of voice data 
to make the data extracted behave more like .they would in 
an infinite duration continuous Fourier Transform. 10 

The Hamming windowed frames are pre-flltered using a 
filter whose z-transform is 1.0-G.97 *z~ l , ("Large Vocabu- 
lary Speaker and Dependent Continuous Speech Recogni- 
tion: The Sphinx System;" Fai-Fu Lee; Carnegie mellon 
University, Department of Electrical and computer engineer- 15 
ing; April 1988; CMU-CU-88-148) page 49 in order to 
flatten the speech spectrum. Then 14 auto-correlation coef- 
ficients are computed. The auto-correlation coefficients are 
used to compute the Cepstrum coefficients in a manner well 
known in the public literature, described in {"Digital Pro- 20 
cessing of Speech signals Prentice Hall Signal Processing 
Series; 1978, pp. 401-402, 411-413). Thirteen Cepstxal 
coefficients are derived from the 14 auto-correlation coeffi- 
cients. Other numbers of auto-correlation coefficients and 
dimensions of numbers of Cepstrum coefficients are pos- 25 
sible. The statistical properties of these coefficients are used 
to guide the final vector quantization step. 

Vector quantization is also used in the training process 
190. The adjustment of the training data described below are 
crucial in enabling the base Sphinx recognition engine to 30 
operate over telephony equipment, and hence to the inven- 
tion described herein. In the training process 190, a number 
of sentences are taken, currently between ten to fifteen 
thousand, and segmented into frames, from which auto- 
correlation and Cepstrum coefficients are calculated. A clus- 35 
tering procedure is applied to segregate the Cepstrum frame 
features into two hundred and fifty six classes using a 
k-means type clustering procedure, described in ("An Algo- 
rithm for Vector Quantizer Design;" Y. Linde, A. Buzo, R. 
Gray, IEEE Transactions on Communications, Vol. Com-28, 40 
No. 1, January 1980). The centers of these Cepstrum clus- 
ters, and their class labels, taken together, are hereafter 
referred to as "code books". The quantization code books 
105 stores the code book for telephone speech, generated by 
acoustic training functions 190. It will store a second code 45 
book for high bandwidth speech. 

For the final step of vector quantization, block 104 refers 
to a code book in the quantization code books 105, FIG. 1, 
derived in the training procedure, just described, to deter- 
mine which cluster center is closest to the frame Cepstral 50 
coefficients. The current frame is then assigned to the class 
represented by that code book value. Since there are 256 
classes, the VQ value is represented by one byte. There are 
two other one-byte VQ values derived, from the Differential 
Cepstrum, and the power in the frame. There are three 55 
one-byte VQ values derived one hundred times per second, 
resulting in a compression of the speech data stream to 2,400 
bits per second. 

Part of the telephony invention herein described is that a 
completely different code book, which characterizes the 60 
speech for the recognizer, must be derived for the telephony 
data and stored in the quantization code books 105 of FIG. 
1. Another part of the invention is that a corresponding 
phoneme model must be derived for the telephony data and 
stored in phoneme models 192. A telephone appreciably 65 
changes the speech signal, because of sampling rate reduc- 
tions, bandwidth compression, and dynamic range compres- 
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sion. However, rather than using voice samples gathered 
over the telephone, which involves a significant work effort, 
high bandwidth samples can be processed to simulate the 
telephone channel characteristics. This allows using the 
large, readily available speech data files used in the initial 
training of the Sphinx System, to enable telephonic speech 
recognition. The telephone channel simulator is the inven- 
tion described here. 

The telephone channel simulation is accomplished in a 
three phased process as follows: 
L) Conversion to Telephone Bandwidth 

High bandwidth, high resolution speech data set, as 
provided by references, ("Speech Corpora Produced on 
CD-ROM Media by The National Institute of Standards and 
Technology (NIST)," April 1991; "DARPA Resource Man- 
agement Continuous Speech Database {RJVfl) Speaker 
Dependent Training Data," September 1989 NIST Speech 
Discs 2-1.1, 2-2.1 (2 Discs) NITS Order No. PB89-226666; 
"DARPA Resource Management Continuous Speech Data- 
base (EMS) Speaker-Independent Training Data," November 
1989 NIST Speech Disc 2-3.1 (1 Disc) NITS Order No. 
PB90-500539; "DARPA Extended Resource Management 
Continuous Speech Speaker-Dependent Corpus (RM2)," 
September 1990 NIST Speech Discs 3-1.2, 3-2.2 NT1S 
Order No. PB90-501776; "DARPA Acoustic-Phonetic Con- 
tinuous Speech Corpus (TIMIT)," October 1990 NIST 
Speech Disc 1-1.1 NTIS Order No. PB9 1-0505065; "Studio 
Quality Speaker-Independent Connected-Digit Corpus 
CnDIGrrS)," NIST Speech Discs 4-1.1, 4-2.1, 4-3.1 NTIS 
Order No. PB91-505592), (for example 16 bit resolution 
collected at either 44,100 Hz, or 16,000 Hz) input at block 
180 in FIG. 1. 

The input speech data set 180 is first resampled to 8,000 
KHz using the resampling program described in (A General 
Program to Perform Sampling Rate Conversion of Data by 
Rational Radios;" from "Programs for Digital Signal Pro- 
cessing," Ed.: Digital Signal Processing Committee of the 
IEEE Acoustics, Speech, and Signal Processing Society; 
IEEE Press, 1979; Section 8.2, pp. 8.2-1 to 8.2-7 by R. E, 
Crochiere) in function block 182 of FIG. 1. This is done after 
supplying it with a codec band-pass filter designed using a 
modified version of the MAXFLAT routine described in 
("Design Subroutine (MAXFLAT) for Symmetric FIR Low 
Pass Digital Filters With Maximally-Flat Pass and Stop 
Bands" from "Programs for Digital Signal Processing," Ed." 
Digital Signal Processing Committee of the IEEE Acoustics, 
Speech, and Signal Processing Society; IEEE Press, 1979; 
Section 5.3, pp. 5.3-1 to 5.3-6 by J. Kaiser) in function block 
182 of FIG. 1. This filter is illustrated in FIGS. 2, 3, and 4. 
The passband characteristic of this filter is designed to 
closely approximate the coding/decoding, or "codec" filters 
used in contemporary U.S. telephonic equipment. The place- 
ment of the passband, 3 db points and the transition band- 
widths are critical to the effectiveness of the invention. It is 
possible to design a codec filter for recognition training 
which will provide good recognition on local telephone 
lines, but not for long-distance lines. To avoid such a 
problem, these characteristics should be placed at say, 300 
Hz for the lower 3 db point, and at 3,600 Hz for the upper 
3 db point The transition bandwidths should be 400 Hz, and 
800 Hz respectively. This allows for a passband from 500 
Hz, to 3,200 Hz. The passband ripple must be no more than 
0.1 percent deviation from unity, throughout the pass band, 
to approximate the characteristics of actual codec filters. 

It is important to note that the Sphinx recognition engine, 
and other recognition engines such as the Tangora, are 
sensitive to spectral distortions introduced by linear filters, 
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which do not have flat frequency response in the passband, 
since the primary speech recognition features are derived 
from frequency spectra, and their derivatives such as cep- 
stra. Somewhat minor deviations from flat passband 
response have been shown in our laboratory to result in 5 
degradations of several percent in the absolute recognition 
error rates, for complex recognition tasks. Therefore a 
maximally flat design algorithm is required. The sensitivity 
of the Sphinx recognition engine to "spectral tilt" has been 
noted in ("Acoustical and Environmental Robustness in 10 
Automatic Speech Recognition," A. Acero; Carnegie Mellon 
University, Department of Electrical and Computer Engi- 
neering; April 1990; CMU-CS-88-148). Therefore, a MAX- 
FLAT, or comparably low passband ripple design is 
required. 15 

The rate conversions required to resampie to 8,000 KHz 
from 44,100 KHz are too demanding for the version of 
MAXFLAT provided in ("Design Subroutine (MAXFLAT) 
for Symmetric FIR Low Pass Digital Filters With Maxi- 
mally-Flat Pass and Stop Bands" from "Programs for Digital 20 
Signal Processing," Ed " Digital Signal Processing Com- 
mittee of the IEEE Acoustics, Speech, and Signal Processing 
Society; IEEE Press, 1979; Section 5.3, pp. 5.3-1 to 5.3-6 by 
J. Kaiser) and it provides only for the design of low-pass 
filters, when a band-pass characteristic is required for the 25 
codec type filter required. The design characteristics for this 
routine are given by two parameters, beta, and gamma, 
representing the 3 db point and the transition bandwidth in 
normalized frequency, with the Nyquist frequency mapping 
to 0.5, and the sampling frequency mapping to L0. It is 30 
suggested, by Kaiser ("Design Subroutine (MAXFLAT) for 
Symmetric FIR Low Pass Digital Filters With Maximally- 
Hat Pass and Stop Bands" from "Programs for Digital 
Signal Processing," Ed." Digital Signal Processing Com- 
mittee of the IEEE Acoustics, Speech, and Signal Processing 35 
Society; IEEE Press, 1979; Section 5.3, pp. 5.3-1 to 5.3-6 by 
J. Kaiser), that gamma should be restricted to values " . . not 
much smaller than 0.05." Values lower than this require the 
routine to be modified to increase the computing precision 
floating point numbers used, and for the filter coefficient 40 
buffers to be expanded from 200, to 4096, since the number 
of terms in such a filter are roughly proportional to the twice 
the inverse square of gamma. This accomplished, filters with 
gamma values as small as 0.005, or about ten times smaller 
0.05, which are required for the 44,100 KHz to 8,000 KHz 45 
conversion were designed, Two low-pass filter designs, a 
low-pass to high-pass conversion, and convolution^ com- 
bination of the high-pass, and low-pass were required to 
achieve the required band-pass characteristic. 

With this filter design accomplished, the 44,100 Hz data 50 
was converted to 8,000 Hz in function block 182 of FIG. 1, 
using the resampie algorithm described in (A General Pro- 
gram to Perform Sampling Rate Conversion of Data by 
Rational Radios;" from "Programs for Digital Signal Pro- 
cessing," Ed.: Digital Signal Processing Committee of the 55 
IEEE Acoustics, Speech, and Signal Processing Society; 
IEEE Press, 1979; Section 8.2, pp. 8.5-1 to 8.2-7 by R. E. 
Crochiere), providing a codec passband characteristic, 
which very closely approximates the passband for U.S. long 
distance telephony equipment This results in a 16-bit, 60 
low-noise signal, which must be treated according to the 
steps 2.) and 3.) below. 

A similar band-pass characteristic, and rate reduction is 
required for the 16,000 Hz speech samples used in this 
training technique, with the exception that the transition 65 
band requirements are less demanding, and fewer filter 
weights are required to achieve the passband flatness char- 



acteristic desired, FIGS. 2, 3, and 4 again show the Impulse, 
Magnitude, and Log Magnitude responses of the codec filter 
as it was implemented for the training one of the pre-training 
resampling operation. 

2. ) Scaling to Normalize the Dynamic Range 

The speech sample utterances are then read individually, 
and scaled to a dynamic range of 14-bits, in function block 
184 of FIG. 1. 

3. ) mu-law Companding 

Each speech sample is then reduced from 16-bit precision 
to 8-bit precision using mu-law compression in function 
block 186 of FIG. 1, as is well known in the public literature, 
such as ("Digital Telephony and Network Integration;" B. 
Keiser, E.- Strange; Van Nostrand Reinhoid Company Inc., 
1985, pp. 26-31). The 8-bit compressed data is then 
expanded, also according to the rnu, law formula back to 
14-bits. 

This results in simulated telephone channel speech data 
set in block 188 of FIG. 1, which has a quantization noise 
level which increases, and decreases with the signal 
strength, and maintains an approximately constant signal- 
to-noise ratio. This introduces the audible "crackle" noise 
present in telephony voice signals, particularly when the 
speaker is loud. 

This treatment of speech utterance data 180, which may 
have been gathered at various bandwidths higher than tele- 
phone data, is used to train in block 190 of FIG. 1, a speech 
recognizer 50 of FIG. 1 for use over telephone equipment. 
Acoustic training 190 generates phoneme models in block 
192 and quantization code books 105 in FIG. 1. This allows 
practical speech recognition at telephonic bandwidth using 
the Sphinx speech recognition engine. 

RECOGNIZER TRAINING USING SIMULATED 
TELEPHONE CHANNEL DATA 

Dual training sessions are then run so that two code book 
sets 105 and two phoneme model sets 192 are created, one 
for telephony, and one for high bandwidth. Each set of code 
books in 105 and each phoneme model set in 192 is kept 
separately, and run separately depending on the requirement 
of the user for high bandwidth, local recognition, or tele- 
phony bandwidth. At either bandwidth, the auto-correlation 
coefficients are extracted to derive Cepstrum coefficients. 
The Cepstrum coefficients are run through the vector quan- 
tizer 104 to classify which is its nearest neighbor for the 
frame. Thus each speech time-series frame is reduced to 
three bytes representing the frame, as described in ("Large 
Vocabulary Speaker and Dependent Continuous Speech 
Recognition: The Sphinx System;" Kai-Fu Lee; Carnegie 
Mellon University, Department of Electrical and Computer 
Engineering; April 1988; CMU-CS-88-148). 

The sets of quantized values are sent to the beam search 
process 106, The beam search 106 is a grammar guided 
Hidden Markov Model search process called a Viterbi beam 
search. This granunar guided searchluses a word pair gram- 
mar to reduce the search space at any given point. 

Another point of the invention is that the recognition 
system can process both local and long distance calls by 
placing the cutoff points of the run time data conditioning 
and rate conversion conversion filter to a bandwidth 
approximating the narrower of the two bandwidths, so that 
either type of call will correspond to the bandwidth used in 
the channel simulator described, above. Hie 3 db point and 
transition band characteristics should closely approximate 
those of the upper transition band of the telephony codec 
filter used in the training and described above. 
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The beam search (block 106) matches time series derived 
in the vector quantization, to word sequences from within 
the word pair grammars, defining each context. The Recog- 
nition Server communicates with user applications or Rec- 
ognition Clients (block 110). The invention's architecture 
can have multiple front end (workstations) communicaung 
to a single back end or multiple front ends communicating 
to multiple back ends. 

Hie system is organized and implemented for different 
levels of operation. For communication networks with a 
very high data rate, the speech samples could be commu- 
nicated directly to the system executing the back-end, for 
front end data compression. A plurality of raw digital speech 
data streams could be sent to lie server containing the back 
end for multiple users. For a telephony system, multiple 
channels go to one back end, or multiple users come in to the 
front end and back end together. 

The system is primarily organized around the speech 
recognition functions deployed as. speech recognition serv- 
ers. The system is guided by any one of a plurality of word 
pair grammars the application has chosen as the current 
context. The application has interfaces to the speech recog- 
nition system with Application Program Interface (API) 
calls supporting functions like initializing procedures, status 
codes and commands ("IBM Continuous Speech Recogni- 
tion System Programmers Guide," B. Booth, 1992, currently 
unpublished, available on request). The application will 
request a certain type of operation or ask the recognition 
server to load a certain recognition context and to activate 
the context for recognition when required. The tasks are 
pre-loaded by the server, usually when the application is first 
executed. They are then sequentially activated, as required 
by the activity of the application program. 

A set of API calls in the recognition server (block 108) 
allows user applications (block 110) to request the services 
of the speech recognition system, user application programs 
(block 110) can be ninning on the same computer or a 
different computer as the various components of the recog- 
nition server. If it is on the same computer, the application 
program (block 110) might interface with the recognition 
server through shared memory and semaphores, supported 
by the operating system. If the application program (block 
110) and recognition server are on a different computers, 
communication can be arranged via an RS232 interface, or 
Remote Procedure Calls (RPC). RPC being well known in 
the prograrnrning literature ("AIX Distributed Environ- 
ments: NFS, NCS, RPC, DS Migration, LAN Maintenance 
and Everything " IBM International Technical Support Cen- 
ters, Publication GG24-3489, May 8, 1990). 

Typical examples of user applications may include: 
Executive Information Systems, Database Access via verbal 
query, software problem reporting systems, and so forth. 

Another example is a telephone answering voice response 
unit (VRU) which could call on the recognition server to 
take advantage of the its services. We have implemented 
versions of these servers on the RISC System 6000 (TM) , and 
PS/2<™> with OS/2^. 

The Direct Talk 6000^ is a similar telephony VRU 
system. In Direct Talk 6000 c ™ ) rather than dealing with 
single telephone lines, the VRU system could require pro- 
cessing of aTl line (with 24 conversation channels, possibly 
active simultaneously). 

The recognition server architecture can handle multiple 
clients, as would be required to process such high- volume 
telephony applications as DirectTalk (TM> . 

The user applications can pre-register many contexts: a 
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restaurants locator, a hard disk help desk, or a software help 
desk can all pre-register multiple contexts hierarchically. 
With each application, several users can be inputting speech 
streams. Each application will tell the recognition server to 
perform a recognition under a particular context for a 
particular speech stream, as appropriate for the task being 
executed. 

In other words, multiple users dealing with the same API 
interface will register all their tasks, with one, or possibly 
several versions of the recognition server. The system 
arranges to avoid redundantly loading recognition tasks for 
multiple users, by checking if the requested task has already 
been loaded. 

The task building (block 130) has several basic sources 
for its input. One is a U.S* English dictionary (block 132), 
which is a base dictionary with the pronunciations of twenty 
thousand words in it The supplemental dictionary (block 
138) is application specific, and allows for the addition of 
the pronunciation of words not found in the base dictionary. 
This would typically consist of proper nouns, acronyms, and 
the like, which a particular application requires for recog- 
nition. 

The base U.S. dictionary (block 132) supplies words and 
the phoneme strings drawn on by the Task Builder (block 
134). The Task Builder also draws on an appropriate task 
Baukus-Naur Form (BNF) grammar to determine what can 
be recognized by the speech server under the task, from the 
Task BNF library (block 136). For example, in an applica- 
tion which provides information on area restaurants, a first 
context may be the type of restaurant the caller wants, e.g., 
French, Italian, Chinese and a second the type was estab- 
lished would be the restaurants in that particular category. 
The task builder analyzes the BNF to find all the words that 
are required for the pattern matching and draws out the 
phoneme representation from the general U.S. dictionary 
(block 132). Inevitably, every particular application has its 
own sub- vocabulary which must be added to the system and 
these are stored in the supplemental dictionaries. For 
example, in a restaurant help desk, there are generic English 
words, such as: "Italian", "French", "Spanish", etc., which 
would be found in the standard U.S. dictionary. However, 
restaurant names, particularly in foreign languages, e.g., 
"Cherchez LesFernmes", "Chateau Voulez", but also 
unusual names for an American restaurant, e.g., J. J. Mul- 
doon's, will not be in any normal dictionary, and must be 
added to the task supplemental dictionary (block 138). These 
supplemental dictionaries (block 138) can also contain local 
vocabulary that is in the base General English (block 132) 
dictionary which override the pronunciations. 

The task builder (block 134) analyzes the input BNF 
grammar, and generates a grammar which is a list of each 
word in the grammar and a sub-list of all the words that can 
follow. Thus each word in the grammar has a list attached to 
it of legal following words and a pointer to the phoneme 
representation of each word, (called phoneme models 192 in 
FIG. 1). The phoneme models 192 are Hidden Markov 
Models of observing the various VQ values. The hidden 
Markov models are a group of discrete probability distribu- 
tions, for the VQ values (as in block 104). These provide the 
probability of the occurrence of VQ values, given that the 
hidden Markov state machine is in a particular state within 
a phoneme. The public literature contains excellent descrip- 
tions of Hidden Markov Models in ("A Tutorial on Hidden 
Markov Models and Selected Applications in Speech Rec- 
ognition," L. Rabiner, Readings in Speech Recognition; Ed.: 
A. Waibel, K. Lee; Morgan Kaufmann; 1990; pp. 267-296), 
as well as elsewhere. 
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The Beam Search (block 106) uses word models made of 
concatenated HMM phoneme models 192 from a large table 
of context sensitive triphones which are generated during the 
training process. These are used to make an optimal estimate 
of the word sequence which best explains the observed 
sequence of VQ values. The beam searcher (block 106) uses 
the word grammars to select the phoneme models 192 from 
which to construct the word models used in the search. 

The user applications control the recognition server. For 
example, DirectTalk/2 CTM) , an IBM Program Product 
described in ("IBM CallPath DirectTalk/2 General Informa- 
tion and Planning Manual;" International Business 
Machines Publication No. GB35-4403-0; 1991), could be a 
user application; it is able to answer the phone and perform 
restaurant locator functions. Hie restaurant locator applica- 
tion would use the DirectTalk/2 (TM) system to indicate to the 
recognition server that it has sixteen contexts and issue a 
request to pre-load the contexts which are part of the 
Restaurant Locator help desk. As the application progresses, 
it requests the context switching of the recognition server. A 
user calls via the telephone for telephone help. The restau- 
rant locator then requests the recognition server to perform 
a voice recognition under the first level context. Control and 
data are exchanged over the API between the recognition 
server, and the user application. Multiple instances of the 25 
DirectTalk/2 (TM) system could use the same recognition 
server. 

Hie speech recognition server acquires speech data until 
a (user adjustable, but most commonly 0.6 seconds) period 
of silence. Recognition is terminated when this period is 
observed, and it is assumed that the person is done speaking. 

The speech recognition system described herein, is archi- 
tected to allow multiple deployments, on multiple hardware 
platforms, and multiple software configurations. For 
example, one possible architecture is shown in FIG. 5, which 
provides a physical mapping of the logical architecture 50, 
discussed above, onto a physical implementation of work- 
stations connected via a local area network 160. Each 
workstation 150, 150', 150" in this architecture can run 
multiple independent user applications, and each is master to 
the recognition server 50 as a slave processor. The PBX 170 
is connected to outside telephone lines and delivers a 
telephony bandwidth data stream to the analog digital con- 
version 100 of the recognition server 50, also shown in FIG. 
1, The text representing recognized speech is returned from 
the recognition server to the user applications in worksta- 
tions 150, 150\ 150". 

TRAINING PROCESS 50 

The training procedure uses a large library of known 
utterances and their textual transcripts, to estimate the 
parameters of the phonemes HMMs 192 used in pattern 
matching of word models to text in the beam search process. 55 

First, the transcripts are used to retrieve the phonemes, 
representing the pronunciation of the words in the training 
set, from the General English dictionary. 

Next the parameters of phoneme HMMs 192 are esti- 
mated in the context of preceding, and following phonemes, 60 
(called triphones) to provide for effective estimation of 
coarticulation effects. The estimation procedure used is the 
Baum-Welch Forward/backward iteration algorithm 
described in ("A Tutorial on Hidden Markov Models and 
Selected Applications in Speech Recognition," L. Rabiner; 65 
Readings in Speech Recognition; Ed.: A. Waibel, K. Lee; 
Morgan Kaufmann; 1990; pp. 267-296). The parameters of 
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the HMMs are iteratively adjusted so as to maximize the 
probability that the trained triphone HMMs would have 
generated the time series of VQ values observed in the 
training set. 

There are many parameters for every hidden Markov 
phoneme model, there being 7 states and 12 transition arcs 
in each hidden state machine. Associated with each transi- 
tion arc are associated 256 discrete elements in the prob- 
ability distribution for each of the three code books. The 
triphone HMMs parameters that result from the training 
procedure are clustered sharply to reduce the number of 
triphones required to adequately represent the coarticulation 
effects present in continuous speech. 

Training is performed with a combination of low band- 
width speech which is gathered via the local telephone 
exchange and high bandwidth speech from a microphone. 
Hie high bandwidth speech is processed by the telephone 
channel simulator 185 described herein, in accordance with 
the invention. All three code books are compiled at this 
stage. The code book versions include Cepstrum, differential 
Cepstrum, Power and differential Power as described in 
("Large Vocabulary Speaker and Dependent Continuous 
Speech Recognition: The Sphinx System;" Kai-Fu Lee; 
Carnegie Mellon University, Department of Electrical and 
Computer Engineering; April 1988; CMU-CS-88-148). 

Each of these three code books stored in quantization 
code books 105 and are used in the run time Vector Quan- 
tization process. The effect of the telephone network is 
simulated here by data pre-conditioning so that the statistical 
properties of these feature code books will be adjusted in the 
same manner in which the public telephony network will 
adjust them. This procedure has resulted in a substantial 
increases in accuracy in actual telephonic speech recognition 
with calls originating from a variety of locations in the 
continental U.S. 

Reference can be made to the flow diagram of FIG. 6 
which describes the telephony channel simulation process 
200 for training a speech recognizer 50 to respond to speech 
obtained from telephone systems, for example, through a 
PBX 170. The flow diagram of FIG. 6 represents a computer 
program method which can be executed on the data proces- 
sor 50 of FIG. 5. 

The process 200 starts with step 202 in which a speech 
data set is input to a speech recognition training processor 
50, whose bandwidth is higher than telephone bandwidth. 
Example high bandwidth speech data sets are identified in 
references ("Speech Corpora Produced on CD-ROM Media 
by The National Institute of Standards and Technology 
(NIST) " April 1991; "DARPA Resource Management Con- 
tinuous Speech Database (PMI) Speaker Dependent Train- 
ing Data" September 1989 NTST Speech Discs 2-1.1, 2-2.1 
(2 Discs) NTIS Order No. PB89-226666; "DARPA 
Resource Management Continuous Speech Database (RMI) 
Speaker-Independent Training Data " November 1989 NIST 
Speech Disc 2.3.1 (1 Disc) NTIS Order No. PB90-500539; 
"DARPA Extended Resource Management continuous 
Speech Speaker-Dependent Corpus (RM2) " September 
1990 NIST Speech Discs 3-1.2, 3-2.2 NTIS Order No. 
PB90-501776; "DARPA Acoustic-Phonetic Continuous 
Speech Corpus (UMIT)," October 1990 NIST Speech Disc 
1-1.1 NHS Order No. PB91-0505065; "Studio Quality 
Speaker-Independent Connected-Digit Corpus (TTDIG- 
ITS)," NIST Speech Discs 4-1.1, 4-2.1, 4-3.1 NTIS Order 
No. PB9 1-505592). This corresponds to data input block 
180 in FIG 1. 

Then, in step 204 in FIG. 6, the speech data set is 
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decimated to obtain a decimated speech data set having the 
telephone bandwidth. This corresponds to function block 
182 of KG. 1. The decimated speech data set may have 
bandwidth which is any bandwidth lower than the higher 
bandwidth of the input speech data set. Decimation process 
are described in reference (A General Program to Perform 
Sampling Rate Conversion of Data by Rational Radios;" 
from "Programs for Digital Signal Processing," Ed.: Digital 
Signal Processing Committee of the IEEE Acoustics, 
Speech, and Signals Processing Society; IEEE Press, 1979; 
Section 8.2, pp. 8.2-1 to 8.2-7 by R. E. Crochiere). 

Then, in step 206 in FIG. 6, applies a bandpass digital 
filter to the decimated speech data set, which characterizes 
transmission characteristics of telephone equipment. This 
corresponds to function block 182 of FIG. 1. This is done to 
obtain a filtered speech data set. The bandpass digital filter 
should have a maximally flat design algorithm. 

Then, in step 208 in FIG. 6, the amplitude of the filtered 
speech data set is rescaled so that its niaximum dynamic 
range matches the maximum range of uncompanded tele- 
phone speech. This corresponds to function block 184 of 
FIG. 1. This is done to obtain a rescaled speech data set. The 
rescaling step can result in the maximum dynamic range 
matching the maximum dynamic range of uncompanded 
mu-law telephone speech. Alternately, the rescaling step can 
result in the maximum dynamic range matching the maxi- 
mum dynamic range of uncompanded A-law telephone 
speech. 

Then, in step 210 in FIG. 6, modifies the rescaled speech 
data set, with quantization noise representing a sequence of 
companding and uncompanding a speech signal in a tele- 
phone system. This corresponds to function block 186 of 
HG. 1. This is done to obtain a modified speech data set The 
modifying step can have quantization noise as mu-law noise. 
Alternately, the modifying step can have quantization noise 
as A-law noise. 

Then, step 212 in HG. 6, inputs the modified speech data 
set into the speech recognition processor 50, to train statis- 
tical pattern matching data units. This corresponds to output 40 
data block 188 of FIG. 1. The simulated telephone channel 
speech 185 is then used by the acoustic training process 190 
to generate phoneme models 192 characteristic of telephone 
code books 105 characteristic of telephone speech. ■ 

Then, in step 214 in FIG. 6, speech recognition can be 45 
performed using the speech recognition processor 50, on 
voice signals from a telephone system, such as, signals from 
the PBX 170 of FIG. 5. 

It should be noted that the conversion of high bandwidth 
speech using the telephony channel simulator (block 185) is 50 
not limited to continuous speech recognizers, but applies to 
a variety of speech recognition processors, such as, the IBM 
TangoraDictation System and Dragon Systems, Newton 
Mass., Dragon 30k Dictate and Kurzweil Applied Intelli- 
gence, Voice Report, Waltham, Mass. and others described 55 
in ('The Spoken Word," Kai-Fu Lee, et al t Byte Magazine, 
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July 1990, Vol. 15, No. 7; pp. 225-232). 

While the invention has been described with reference to 
a preferred embodiment, it will be understood by those 
skilled in the art that various changes can be made to the 
architecture without departing from the spirit and scope of 
the invention. Accordingly, the invention shall be limited 
only as specified in the following claims. 

We claim: 

1. A method for training a speech recognition processor to 
respond to speech obtained from telephone systems, com- 
prising the steps of: 

inputting a speech data set to a speech recognition training 
processor, said data set having a bandwidth higher than 
a telephone bandwidth; 

decimating said inputted speech data set in said training 
processor to obtain a decimated speech data set having 
said telephone bandwidth; 

applying a bandpass digital filter to said decimated speech 
data set in said training processor, said filter character- 
izing transmission characteristics of telephone equip- 
ment, for obtaining a filtered speech data set; 

rescaling the amplitude of said filtered speech data set in 
said training processor, so that the maximum dynamic 
range of said filtered speech data set matches the 
maximum dynamic range of uncompanded telephone 
speech, to obtain a rescaled speech data set; 

modifying said rescaled speech data set in said training 
processor, with quantization noise representing com- 
panding and uncompanding a speech signal in a tele- 
phone system, to obtain a modified speech data set; 

inputting said modified speech data set into a hidden 
Markov model speech recognition processor to train 
statistical pattern matching data units; 

performing speech recognition on voice signals from a 
telephone system with said speech recognition proces- 
sor. 

2. The method of claim 1 wherein: 

said telephone bandwidth is any bandwidth lower than 
said higher bandwidth. 

3. The method of claim 1 which further comprises: 
said bandpass digital filter has a maximally flat design 

algorithm. 

4. The method of claim 1 wherein said rescaling step 
results in a maximum dynamic range matching a maximum 
dynamic range of uncompanded mu-law telephone speech. 

5. The method of claim 1 wherein said rescaling step 
results in a maximum dynamic range matching a maximum 
dynamic range of uncompanded A-law telephone speech. 

6. The method of claim 1 wherein said modifying step has 
quantization noise as mu-law noise. 

7. The method of claim 1 wherein said modifying step has 
quantization noise as A-law noise. 
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[57] ABSTRACT 

Apparatus for an integrated architecture for an ex- 
tended multilevel secure database management system. 
The multilevel secure database management system 
processes security constraints to control certain unau- 
thorized inferences through logical deduction upon 
queries by users and is implemented when the database 
is queried through the database management system, 
when the database is updated through the database 
management system, and when the database is designed 
using a database design tool. 
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SYSTEM FOR MULTILEVEL SECURE DATABASE 
MANAGEMENT USING A KNOWLEDGE BASE 

WITH RELEASE-BASED AND OTHER SECURITY 
CONSTRAINTS FOR QUERY, RESPONSE AND 5 
UPDATE MODIFICATION 

BACKGROUND OF THE INVENTION 

It is possible for the users of a database management 
system to draw inferences from the information that 
they obtain from the database. The inference process 
can be harmful if the inferred knowledge is something 
that the user is not authorized to acquire. That is, a user 
acquiring information which he is not authorized to ^ 
know has come to be known as the inference problem in 
database security. In a multilevel operating environ- 
ment, the users are cleared at different security levels as 
they access a multilevel database where the data is clas- 
sified at different sensitivity levels* A multilevel secure 20 
database management system (MLS/DBMS) manages a 
multilevel database where its users cannot access data to 
which they are not authorized. Currently available 
multilevel secure database management systems cannot 
provide a solution to the inference problem, where 25 
users of the system issue multiple requests and conse- 
quently infer unauthorized knowledge. 

Two distinct approaches to handling the inference 
problem have been proposed in the past. They are: 

(i) Handling of inferences during database design* 3(J 

(ii) Handling of inferences during query processing. 
The work reported in Morgenstem, M. 3 May 1987, 

"Security and Inference in Multilevel Database and 
Knowledge Base Systems/* Proceedings of the ACM 
SIGMOD Conference, San Francisco, Calif ; Hinke, T., 35 
April 1988, "Inference Aggregation Detection in Data- 
base Management Systems," Proceedings of the IEEE 
Symposium on Security and Privacy, Oakland, Calif.; 
Smith, G., May 1990, "Modelling Security-Relevam 
Data Semantics," Proceedings of the 1990 IEEE Sym- 40 
posium on Security and Privacy, Oakland, Calif.; and 
Lunt, T., May 1989, "Inference and Aggregation, Facts 
and Fallacies," Proceedings of the IEEE Symposium 
on Security and Privacy, Oakland, Calif, focuses on 
handling inferences during database design where sug- 45 
gestions for database design tools are given. 

In contrast, the work reported in Thuraisingham, B., 
December 1987, "Security Checking in Relational 
Database Management Systems Augmented with Infer- 
ence Engines,** Computers and Security, Volume 6, No. 50 
6.; Thuraisingham, B., August 1990, The Use of Con- 
ceptual Structures in Handling the Inference Problem, 
Technical Report M90-55, The MITRE Corporation, 
Bedford, Mass.; Keefe, T., B. Thuraisingham, and W. 
Tsai, March 1989, "Secure Query Processing Strate- 55 
gies," IEEE Computer, Volume 22, No. 3, pp. 63-70 
focuses on handling inferences during query processing. 

Other work on handling the inference problem can be 
found in Buczkowski, L.J., and Perry, E.L., "Database 
Inference Controller," Interim Technical Report, Ford 60 
Aerospace Corporation, February 1989, where an ex- 
pert system tool which could be used by the System 
Security Officer off-line to detect and correct logical 
inferences is proposed. Rowe, N., February 1989, "In- 
ference Security Analysis Using Resolution Theorem- 65 
Proving," Proceedings of the 5th International Confer- 
ence on Data Engineering, Los Angeles, Calif, investi- 
gates the use of Prolog for handling inferences. 
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In Thuraisingham, B„ August 1990, The Use of Con- 
ceptual Structures in Handling the Inference Problem, 
Technical Report M90-55, The MITRE Corporation, 
Bedford, Mass. various strategies that users could utilize 
to draw inferences are identified. This set of strategies is 
more complete than the one proposed in Denning, 
D.E., et aL, "Views as a Mechanism for Classification in 
Multilevel Secure Database Management Systems," 
Proceedings of the IEEE Symposium on Security and 
Privacy, Oakland, Calif. 1986. In Thuraisingham, B., 
August 1990, The Use of Conceptual Structures in Han- 
dling the Inference Problem, Technical Report M90-55, 
The MITRE Corporation, Bedford, Mass., some pre- 
liminary ideas on novel approaches to handling the 
inference problem are discussed. These include ap- 
proaches based on mathematical programming, induc- 
tive inference, information theory and game theory. 
Further, in Thuraisingham, B., August 1990, The Use of 
Conceptual Structures in Handling the Inference Prob- 
lem. Technical Report M90-55, The MITRE Corpora- 
tion, Bedford, Mass. complexity of the inference prob- 
lem is analyzed based on concepts in recursive function 
theory. 

The present application discloses an apparatus and 
method for designing a multilevel secure database man- 
agement system that can resolve the inference problem 
via the effective use of security constraints. In the new 
system, some security constraints are handled during 
the query operation, some during the update operation, 
some during the database design operation. The major 
advance achieved by the invention disclosed herein 
over prior art is the use of security constraints in a novel 
way to handle the inference problem. In addition, pro- 
totypes which effectively handle these constraints are 
also disclosed. Further advances relate to the use of 
conceptual structures for representing and reasoning 
about multilevel applications, the development of a 
logic for secure data/knowledge base management sys- 
tems and the development of a knowledge base infer- 
ence controller. 

SUMMARY OF THE INVENTION 

The invention disclosed herein is an apparatus and 
method for querying, designing and updating a multi- 
level secure database management system in order to 
resolve the inference problem. 

A method for processing security constraints in a 
multilevel secure database management system are dis- 
closed. Security constraints are rules that assign secu- 
rity levels to data. This method is based on specifying 
security constraints as horn clauses. The apparatus and 
method disclosed here uses nine types of security con- 
straints: 

1. Simple constraints that classify a database, a rela- 
tion or an attribute; 

2. Content-based constraints that classify any part of 
the database depending on the value of some data; 

3. Event-based constraints that classify any part of the 
database depending on the occurrence of some 
real-world event; 

4. Association-based constraints that classify associa- 
tions between attributes and relations; 

5. Release-based constraints that classify any part of 
the database depending on the information that has 
been previously released; 

6. Aggregate constraints that classify collections of 
data; 
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GetFindName makes the identification by matching the caller's keyed input (SMSI caller id{00), digit name, 
extension number) against a list of user ids generated from the Host When the extension number or digit name 
is entered, a minimum number of keyed inputs are received and accepted by the Channel Process for identi- 
fication. The Channel Process uses the keyed inputs to retrieve a list of user ids from the Host Then it searches 
the list of user ids for a unique match. 

When GetFindName has identified the caller or destination, it requests the user profile stored in the data 
base. For example, at log on to system services, the caller has the option of entering a keyed input of pound 
(#}, his extension number, or his digit name. If the caller enters a keyed input of pound (#), GetFindName uses 
the SMSI caller id and if the caller enters his name or extension GetFindName uses the keyed input for iden- 
tification. 

In any case, when the keyed input is entered, GetFindName uses the list provided by the Host to identify 
the extension number or digit name. When a caller enters an extension or digit name as the destination when 
sending a message to another party, GetFindName uses the list provided by the Host in order to identify the 
receiver so that the message is sent to the correct destination. 
PARAMETERS 

CALLER'S USER ID. This parameter means that GetFindName is used to identify a caller. 
DESTINATION USER ID. This parameter means that GetFindName is used to identify a receiver. 
EDGES 

0 = Found 

1 = Not found 

2 = Input incomplete. A time out has occurred. 
T1 = No input A time out has occurred. 

T2 = Last time out has occurred 
HUP ~ The caller has hung up, 

GetFindPassword 

The GetFindPassword action compares the keyed input password with the password defined in the user 
profile. The last key that is entered must be a # key. 
PARAMETERS NONE REQUIRED. 
EDGES 

0 = Password correct 

1 = Password false 

2 = Input incomplete (time out) 

= Mistake keyed. Stops the process in the event of a mistake. 
T1 = Time out 
T2 = Last repeat time 
HUP = The caller has hung up. 

EvaluateData 

This action is used to test the values of system variables with other variables or constants. The flow of the 
state table can then be altered based on the results of the evaluation. 
PARAMETERS 

PARM 1 : Variable id or constant to test 
PARM 2: Variable id or constant 
EDGES 

0 = Parm 1 less than Parm 2 

1 = Parm 1 equal to Parm 2 

2 = Parm 1 greater than Parm 2 
HUP = The caller has hung up. 

AssignDafa 

This action can be used in the state table to perform simple arithmetic or string concatenation. It can be 
used to preset variables to specific values before using them in an action, or as counters in the state table. 

For example, the AssignData action can be used to do loop processing. If there are three prompts for a 
password, then after the first prompt, AssignData is used to loop back to the first prompt The AssignData action 
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can also be used to pre-assign a variable to a given value before calling another action. 
PARAMETERS 

PARM 1: Operation {1 add; 2 Subtract; 3 Multiply, 4 Divide; 5 Assign only. 
PARM 2: Variable id of buffer to contain the results 
5 PARM 3: Variable id or constant for the first operand; if Param 1 is not 5, then also use this. 
PARM 4: Variable id or constant. 
EDGES 

0 = Assignment complete; 

1 = Assignment overflowed buffer; 
10 HUP = The caller has hung up. 



PlayVoice 



Like the PlayPrompt action, this action plays digitized data on the voice channel. It is used to play voice 
15 segments, audio names, user greetings, or user messages (voice mail) from either the Host data base or the 
RecordVoice workspace area. This action is used after recording voice to allow the user to verify what he has 
recorded before saving it; or, in the case of user messages, it is used before sending the messages to the des- 
tination mailboxes. Each user message is assigned an active message number. This is the pointer, which is 
activated by current message header that is under examination. 
20 PARAMETERS 

PARM 1: VOICE TYPE 

1 Voice segment (PlayVoice is used only by administrator to record or play voice segments.) 

2 Audio name 

3 User greeting 

25 4 User message (PlayVoice uses the variable active message number specified by GetKey.). 

The following parameters are required depending on the voice type: 
VOICE SEGMENT 

Parm 2: Numeric buffer name containing the voice segment id 
Parm 3: Numeric buffer name containing the language code. 
30 AUDIO NAME 

Parm 2: Character buffer name containing the user id. 
USER GREETING 
Parm 2: Character buffer name containing the user id 
Parm 3: Numeric buffer name containing the greeting entry number. 
W 35 NOTE: For all voice types, Parm 2 can provide the workspace area. The workspace area is where the user 

H can play the voice data that has been recently recorded, before making a decision to record, save, or delete 

f f\ the voice data, 

p. EDGES 
rt 0 = Ray completed 

" 40 1 = Voice channel problem 

2 = Voice record not found 
HUP = The caller has hung up. 



| si 



RecordVoice 



45 



This action is used to record voice as digitized voice data on the system disk into a voice segment, audio 
name, user greeting, or user message. 

The voice data is first recorded into a temporary workspace area from which it can be replayed and verified 
with the PlayVoice action before storing it at its final destination through SaveVoice. 
50 PARAMETERS 

PARM 1: VOICE TYPE 

1 Voice segment (RecordVoice is used only by the system administrator to record or play voice segments.) 

2 Audio name 

3 User greeting 

55 4 User message (RecordVoice uses the variable active message number specified by GetKey.) 
EDGES 

0 = Recording completed 

1 = Voice channel problem 
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2 = No voice recorded 

3 = Disk full 

T1 = Time out xx seconds remaining to record before maximum time is reached. It is specified by the Sys- 
tem Administrator. 

5 T2 = The maximum recording time has been reached. HUP = The caller has hung up. 
SaveVoice 

This action saves previously recorded voice data for the specified voice type. When recording voice, the 
10 data is always recorded into a temporary workspace first This action copies the voice data from the workspace 
area to its destination (for example, voice segment id, audio name, or user greeting). 
PARAMETERS 
PARM 1: VOICE TYPE 

1 Voice segment (SaveVoice is used only by the system administrator to record or play voice segments.) 
15 2 Audio name 

3 User greeting 

4 User message (SaveVoice uses the variable active message number specified by GetKey.). 
The following parameters are required depending on the voice type: 

VOICE SEGMENT 
20 Parm 2: Numeric buffer name containing the voice segment id 
Parm 3: Numeric buffer name containing the language code. 
AUDIO NAME 

Parm 2: Character buffer name containing the user id. 
USER GREETING 
n 25 Parm 2: Character buffer name containing the user id 

Parm 3: Numeric buffer name containing the greeting entry number. 
USER MESSAGE 

Parm 2: Numeric buffer name containing the receiver id 
Parm 3: Mailbox id. 
30 EDGES 

0 = Save successful 

1 = Save unsuccessful 
HUP = The caller has hung up. 

35 DeleteVoice 



This action deletes previously recorded voice data for the specified voice type. 
PARAMETERS 
PARM 1: VOICE TYPE 

40 1 Voice segment (DeleteVoice is used only by the system administrator to record or play voice segments.) 

2 Audio name 

3 User greeting 

4 User message (DeleteVoice uses the variable active message number specified by GetKey.). 
The following parameters are required depending on the voice type: 

45 VOICE SEGMENT 

Parm 2: Numeric buffer name containing the voice segment id 
Parm 3: Numeric buffer name containing the language code. 
AUDIO NAME 

Parm 2: Character buffer name 
50 USER GREETING 

Parm 2: Character buffer name containing the user id 

Parm 3: Numeric buffer name containing the greeting entry number. 

NOTE: For all voice types, Parm 2 provides the workspace area. 

EDGES 

55 0 = Delete successful 

1 ~ Delete unsuccessful 

2 = Voice record not found 
HUP = The caller has hung up. 
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CheckStorage 

This action is used to check system resources in order to allow an alternate flow through an application 
based on the resources available. It is normally used at the beginning of an application to determine if there 
are any storage problems, it is also used before recording to determine if there is space available and whether 
or not the item already exists. 
PARAMETERS 
Calling parameters 

PARM 1: Resource or item conditions to check: 

1 Voice segment exists 

2 Audio name exists 

3 User greeting exists 

4 Mailbox space is available 

5 System disk storage space is available 

Based on the condition specified above, the following parameters are also needed. 
VOICE SEGMENT 

Parm 2: Numeric buffer name containing the voice segment id 
Parm 3: Numeric buffer name containing the language code 
AUDIO NAME 

Parm 2: Character buffer name containing the user id 

Parm 3: Numeric buffer name containing the language code USER GREETING 

Parm 2: Character buffer name containing a user id 

Parm 3: Numeric buffer name containing the greeting entry number in the application profile. 
MAILBOX SPACE 

Parm 2: Character buffer id containing a user id 
Parm 3: 1 Check space for new messages. 
2 Check space for saved msgs. 

EDGES 

0 = Condition is true 

1 = Condition is false 
HUP = The caller has hung up 

CheckMailbox 

The CheckMailbox action checks the mailbox of the specified user id for incoming or outgoing mail. For 
example, if the message headers for the messages that are stored in the data base contain the sender's user 
id, date and time the message was sent, and message attributes such as message type and status. The first 
time the user invokes CheckMailbox, the active message number acts as a pointer to the current message 
header that is under examination. If the user continues to invoke CheckMailbox, the active message number 
acts as a pointer to subsequent message headers in the data base, in effect, when, the Check first entry and 
Check next entry parameters that are defined in the state table are invoked, the most recent message is played 
first, followed by any older messages. 
PARAMETERS 
PARM 1: 

1 Check first entry 

2 Check next entry 
PARM 2: (Parm 1 = 1) 

1 incoming new mail 

2 Incoming old mail 

3 Outgoing new mail EDGES 

0 = No messages 

1 = Messages retrieved 

2 = Mailbox is busy 

HUP = The caller has hung up. 

UpdateMailbox 

This action updates the message header entries in a message in order to discard or save a received mes- 

26 



EP 0 484 070 A2 



10 



sage, to send messages to other user id's, or to update the message's attributes. For example, with this action, 
the user can alter the selection type of a given message (for example, regular, urgent, or emergency), change 
the security level of a given message, or update the receiver id. 
PARAMETERS 

PARM 1: The attribute field to update containing the data to update the field. 
EDGES 

0 = Update complete 

1 = Update failed 

HUP = The caller has hung up. 

UpdateUserProfile 



This action is used to modify the user profile. Some fields of the user profile can be modified by the sub- 
scriber, such as password and selected language; while some fields cannot such as name and number of mes- 
15 sages. UpdateUserProfile allows the selection of the field to update using the parameter field. 
PARAMETERS 

PARM1: User profile field that is to be updated 
PARM 2: Buffer name containing data that is to be updated 
EDGES 
20 0 = Update complete 
1 = Update error 

SendData 

25 This action is used to send data and/or commands to a Host application through the General Purpose Ser- 
ver. 

PARAMETERS 

PARM 1 : Host application function id 

PARM 2: Host application subfunction id 

30 PARM 3: The number of variables to send 
PARM 4 - N: The list of variable ids to send. 
EDGES 

0 = Send completed successfully 

1 = Send error (ASI pb) 

35 HUP = The caller has hung up. 

ReceiveData 

This action is used to parse the data received back from a Host application through General Purpose Ser- 

40 ver. 

PARAMETERS 

PARM 1 : Host application function id 

PARM 2: Host application subfunction id 

PARM 3: The time out in seconds 
45 PARM 4: The number of variables to receive 

PARM 5 - N: The list of variable ids to receive. 
EDGES 

0 = Receive completed successfully 

1 = Host problem 

so 2 = No data returned 

T1 = Time out (no response) 
HUP = The caller has hung up 

GetFindData 

55 

This action is used to look up tables on Host systems. The user is prompted for an entry with PlayPrompt 
and then this action is called to accept the caller's entry. When a specified number of keys is entered, a request 
is sent to the Host for a list of table entries that begins with this input After the list is returned, a search is make 
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in the list for a unique match. More keys can be entered by the caller and the search is repeated until a unique 
match is found within the Hst As soon as a unique match is found, the complete entry is placed in the buffer. 

For example, for a list of city names that contains Santa Cruz and Santa Clara, the caller enters the first 
seven keys of these two city names in order to get a unique match. Once the unique match is found, the entire 
5 entry for that city name (in the case of Santa Cruz, the entire entry is nine keys) would be placed in the buffer. 
PARAMETERS 

PARM 1: Host application function id 

PARM 2: Host application subfunction id to use to retrieve the data list 
PARM 3: Variable id to accept the data into, as in GedData 
10 PARM 4: The minimum length of the input to accept before sending the data request to the Host 
PARM 5: Host time out in seconds to wait for a response to the data request 
EDGES 

0 = Input successful, and match found 

1 - No match 
15 2 ~ Input incomplete, time out waiting for DTMF input 

T1 = Time out Nothing entered 
T2 = Last repeat time 
HUP = The caller has hung up. 

20 CallStateTable 

This action is used to invoke another state table from within a state table. The other state table is executed 
until either a CloseSession action or an ExitStateTable action is invoked. CallStateTable is used to implement 
a series of actions in several state tables. For example, after creating one state table with a series of actions, 
w 25 CallStateTable can be used to execute these actions from other application state tables. 

CallStateTable can also be used to implement a menu for a caller to select which application to run. Then 
each application can be written in its own state table and called from the menu state table. 
PARAMETERS 

PARM 1: Variable id containing the state table id to execute 
30 PARM 2: Variable id containing the state table release 

PARM 3: Variable id containing the state table entry edge. 
EDGES 

0 to 12 = Edge returned by action ExitStateTable 
13 = State table not found 

35 HUP - The caller has hung up. 

ExitStateTable 

This action is used to return from a nested state table back to the one that called it The parameter for this 
40 action is the variable id that contains an edge value which CallStateTable is to use as if s return edge. Exit- 
StateTable ends the execution of the current level nested state table and return to the state table it was called 
from. CloseSession ends execution of all levels of nested state tables, close the session, and return the state 
machine hack to Idle state. If this action is called from a state table that is not nested, the CloseSession action 
is executed instead. 
45 PARAMETERS 

PARM 1: Variable id containing the return 
EDGES None. 



Disconnect 



50 



This action is used to disconnect a caller. Disconnect is used only for specific purposes because a normal 
end-of-session occurs when the caller hangs up. It can be followed by a CloseSession action or other actions. 
PARAMETERS NONE REQUIRED. 
EDGES 
55 0 - Complete. 
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CloseSession 

This action clears all buffers used in the preceding session. It is used as the last action of a session. 
PARAMETERS NONE REQUIRED. 
EDGES 

0 = Complete 

1 = Not possible. This process is deactivated. 

APPLICATION SCENARIOS 

This section introduces the application scenarios. The first part discusses the functional characteristics of 
the application scenarios. The second part illustrates the use of these applications. 

FUNCTIONAL CHARACTERISTICS 



The VPACK sends the ABCD signalling bits to the voice card driver where, based on the country-specific 
signalling translation table, the driver translates the bits into the appropriate state and passes the state to the 
Channel Process or Node Manager. The out-of-band call information signalling includes, at a minimum, the 
called number. When this call information arrives or a time out condition occurs, the channel goes off-hook the 
20 session begins. 

If signalling information is unavailable, for example, a not capable, or fails to arrive within the defined time 
limit after ringing begins, the call is answered and default conditions are assumed. An example might be a cus- 
tomer who wishes to order supplies. The customer is directed to a particular line group, and then navigates 
through a preliminary voice menu to select the specific VIS application to run. When the caller is connected to 
25 his/her application, the application script may present a greeting and prompts for a password. The password 
is verified by querying the application data base server, and the script proceeds with voice menus, prompting 
for DTMF responses. 

Host interaction is carried out by passing all coded DTMF responses, other than navigational responses, 
to the data base server. The customer-written data base server may field the request directly or simply pass 
30 the request and response to a remote data base server. In this manner, the customer may elect to use existing 
data base applications and interface protocols. 

APPLICATION EXAMPLES 

35 The following sections contain examples of application scenarios VIS application, telephone service appli- 
cation, and workstation applications. 

VIS Application - Bus Schedules and Fares 

40 This scenario is an example of a voice interactive data base application. In this scenario, a consumer calls 

a listed phone number for a bus schedule and the fare service. Based on the called number identification, the 
bus schedule script is loaded. In the following system script answer, the words in caps are variables generated 
from the VAG. 

"Hello, this is Global Bus Lines route and fare system. You may back up a step at any time by pressing 
45 the start key on your touch tone phone. To begin, please turn to page THREE of your phone book for the bus 
terminal location codes. "Using the touch tone keypad on your telephone, please enter the THREE-digit terminal 
code for the city where your travel begins. 



(Customer enters three-digit terminal code.) 
"You are travelling from BOSTON. 

"Please enter the THREE-digit code for the location where your travel will terminate. 
(Customer enters three-digit terminal code.) 
"You will be travelling to ATLANTA." 

"Please enter the day you wish to travel. One is Monday and seven is Sunday. 
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(Customer enters one-digit day code.) 
"You will be travelling on TUESDAY. 

"What time of the day do you want to leave BOSTON? Please enter up to four digits using the 24-hour 
5 clock, followed by a # sign. 

(Customer enters departure time.) 

"Departure time from BOSTON is OH EIGHT HUNDRED hours. Arrival time in ATLANTA is TWENTY FIF- 
10 TEEN hours. Fare is EIGHTY-FIVE dollars and SIXTY-EIGHT cents. "Press the pound sign for the next depar- 
ture or the star key to respecify your trip. If you need further help, press zero then the # sign, and you'll be 
transferred to a live operator." 

Telephone Service Application 

The flowchart in Figure 10 illustrates the progression of a telephone service application. From a selection 
of customer-installed options, the customer is requesting her telephone service be suspended for a period of 
time. After being welcomed to the system at function block 700, she is prompted to enter her telephone number 
710, and any pertinent information relevant to the task he is requesting. This information is sent to the Host at 
function block 720 for verification and implementation. 

If the customer has entered invalid information, she is prompted to re-enter the data at or call a help number 
at function block 730. If the customer has entered valid information, she receives a message to suspend or 
restore telephone service at function block 740. 

After confirming her option to suspend, she is prompted to enter two telephone numbers and the dates she 
wants her service to be suspended 750. One telephone number is the number she is calling from and the other 
is a number where she can be reached in case of an emergency. 

This data is sent to the Host 760 where it is processed and stored. Then, a message restating the cus- 
tomer's requirements along with a request for confirmation is sent back to the customer as shown in function 
block 770. At this point, she can confirm the suspension requirements or transfer to an operator. 

Workstation Applications 

This section includes examples of the following workstation applications: answering and message taking, 
message retrieval, message recording, and message delivery. 

ANSWERING AND MESSAGE TAKING: 

In an answering and message taking forwarded to the VAE, the subscriber's greeting is given to the caller, 
a message is recorded and stored in the subscriber's mailbox, and the call is terminated. 
40 1. Depending on the profile, the script causes either a standard or personalized greeting to be played. If 

personalized, the greeting must be fetched from the data base. The compressed greeting is retrieved in 
segments, decompressed, and then sent on the outbound circuit, with successive segments fetched in 
pipelined fashion. 

2. After the greeting is played, the inbound circuit is opened to receive voice signals while monitoring for 
45 on-hook and DTMF tones. 

3. As the caller's spoken message arrives, it is compressed into segments and stored in the data base. As 
each segment is complete, it is assigned a unique identifier. 

4. If a key is pressed, the digit is extracted from the stream and processed against the script This action 
is either disregarded or followed by a spoken prompt 

so 5. If the caller hangs up, or if a predetermined time limit is reached, the script causes the ASI to disconnect 

the telephone circuit The last message segment is stored and the requisite indexes that are stored in the 
data base are updated to reflect the addition of the message to the subscriber's mailbox. 
6. If the class of service for the subscriber dictates that the arrival of this message requires a notification, 
the ASI sends a control message to an application in the Host requesting that a notification be scheduled. 

55 

MESSAGE RETRIEVAL: 

In a message retrieval application, the subscriber calls a service number to gain access to the mailbox ser- 
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vices. The most common service used is the retrieval of messages from the mailbox. 

1 . The incoming call is received at the ASI, together with the called number identifying the service. The 
appropriate script is selected and activated. 

2. The script causes a prompt to be played to the subscriber requesting a keyed identification and password. 

3. The DTMF decoding function in the ASI extracts digits and the control function uses them to ensure that 
correct steps are followed and the pertinent information is collected. Corrective prompts played, when 
necessary, and the on-hook tone is monitored. 

4. When enough information is collected, the presumed profile is accessed from the data base and the 
password is verified. Upon verification, a prompt is played giving the options of services available. The sub- 
scriber then selects the mailbox retrieval. 

5. The ASI requests messages from the data base and constructs a prompt that gives the number of mes- 
sages. This prompt, along with the service options, is played to the subscriber. 

6. The subscriber's keyed response causes the messages to be retrieved from the data base. The sub- 
scriber can select from two different options: the longer version where he can listen to the entire message, 
or the shorter version where the can scan the mailbox for the message he wants to listen to. The flow for 
each option is: 

O Long version 

- Play message header 

- Play message 

- Delete the message 

- Go on to the next message. 
O Short version 

- Play up to four message headers 

- Select the message number of the message he wishes to listen to, or skip forward to the next four 
message headers. 

7. Message information is retrieved in segments, decompressed, and played back. The retrieval is 
scheduled sufficiently ahead of the playback rate to avoid interruptions in the regenerated voice. During 
playback, the subscriber has extensive key-invoked capabilities for review and spacing. 

8. After playback of each message, a disposal prompt is given. Choices include deletion, retention, and 
forwarding. Extensive prompting is available for each option. 

9. After retrieving messages, through key-controlled navigation, the subscriber may go to another option 
on the main menu. If this is done, or if a disconnection occurs, the retrieval transaction is over. The message 
data base is updated, and the relevant indexes and the subscriber related information is changed to reflect 
the actions taken. 

MESSAGE RECORDING: 

In a message recording transaction, the subscriber records a message to be placed in the mailboxes of 
other subscribers. The subscriber gains access to this service by calling the service access number. 

1 . The called number activates the script The subscriber logs on and, using the keypad, selects the record- 
ing function from the spoken menu. 

2. The steps taken to receive, compress, and store the message are the same as those taken for a forwar- 
ded call. Review and edit capability may be invoked with the keypad. The operation may be abandoned 
at any time by either keyed selection or disconnection. 

3. When the end of the message is reached, a prompt that asks for the destination is played. This is keyed 
as either a single target or a list of targets. The designation may be an explicit phone number or an alias 
that references a predefined directory. A list is always aliased. The ASI builds the requisite message index 
entries and causes the updates Host server to be received into the mail data base. 

4. Again, the transaction is ended with keypad navigation to other menu options or with the subscriber going 
on-hook. 

MESSAGE DELIVERY: 

When a subscriber accesses VAE in direct-access mode, he can record a message. Then, he must specify 
when to send the message and the destination of the message. 

The VAE has two message delivery types: immediate and scheduled. For the immediate delivery type, the 
default option is the message is delivered to the receiver mailbox as soon as it is sent For the scheduled delivery 
type, VAE allows the option of specifying the date (time, day, month) tor the message to be sent The VAE allows 
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two receiver types. The first is a subscriber identified by a user id (telephone number) or name. The second is 
a distribution list containing a list of subscribers. 

The VAE architecture permits message delivery to a non-subscriber. The system can dynamically create 
dummy user profiles and mailboxes to store information. Then, through outcailing capability, which is implemen- 
5 ted in later releases, the VAE can deliver the message to the receiver. After message delivery, the VAE deletes 
the dummy information. This function may be implemented in future releases. 



SESSION MANAGER DESIGN 



10 The Session Manager design includes the following functions: 
O Channel Processes; 

O Internal State Machine including its internal actions; 
O Application Actions; and 

O National Language Support for the Piayprompt action. 
is This section provides the design specifications for these functions. At system startup, the Node Manager 
will FORKO and EXECQ a single channel process. The first channel process acts as the Session Manager. 

The Session Manager performs all the initialization steps that are common to all the channel processes, 
then these channel processes to the entire ASI system, and after that, it goes back to being a channel process. 
The purpose of the Session Manager is to reduce system startup time and provide a simple means of shar- 
20 ing code space for all the Channel Processes. It does this by performing all initialization code that is common 
to all Channel Processes, and then uses the FORKO function to generate the required number of processes. 

; n Application Actions 

^ 25 This section describes the Session Manager application actions. They are: 

O PlayPrompt 
13 O GetKey 

:,U O GetData 

* p O RecordVoice 

i f I 30 O PlayVoice 

" O CheckStorage 

f ^ O Disconnect 

1% O CloseSession. 

H 35 ACTIONJ>LAY_PROMPT ROUTINE: 

rfi 

r;\ This action builds and plays the voice prompts that are defined in the prompt directory. Prompts consist of 

iLj, the following items: 

O timeout, in seconds, for the user to respond to the prompt at the next GETDATA or GETKEY action; 
40 O number of times a prompt can be repeated before the GETDATA or GETKEY action returns a timeout 

and 

O list of segment id's, system variables, and conditional tests. The conditional tests allow control over what 
segments and variables are to be played for a particular prompt based on conditions at run-time. 



45 INTERNAL STATE TABLE 



The State Machine internal state table provides the State Machine with the basic rules to run the IDLE, 
ANSWERCALL, PLACECALL, and ENDCALL actions. This table is the foundation of the State Machine. All 
customer defined state tables are envoked by the ANSWERCALL and actions. Figure 11 shows the internal 
so state table. 



System Parameters 



The Channel Processes system parameters are: 
55 O Number of channel processes to run 

O Internal state table 
O Process control block 
O Blocking number for voice I/O 4K buffers 
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O Timeout number of seconds before time limit is reached 
O Voice message record stop key 

O Number of records to chop off at the end of recording leading edge of the DTMF key. 
5 Error Recovery 

All errors are logged and/or the Node Manager is notified. Requested UP is unobtainable and request UP 
for no SMSI information is also unobtainable. For miscellaneous errors, state tables could be defined to handle 
most error conditions and would provide flexibility without requiring code changes. 

NATIONAL LANGUAGE SUPPORT FOR THE PLAYPROMPT ACTION 

National Language Support (NLS) setup is a set of rules and programs that play a complex variable in a 
local custom dependent way. This means that the complex variable is played in a number of different ways. 
Thus, supporting numerous languages and different language syntax. 

The format of data input and output are local custom dependent Such data includes the numbering system, 
currency, date formats, time formats and telephone number formats. NLS uses a table-driven design. This elimi- 
nates the need to write a new program for each new language or language syntax supported by the system. 

Using this approach, the code is independent from the language or syntax; only a new table is required. 
Examples of complex variables are: 
O Numbers 

O Ordered numbers (first, second, and so on) 
O Date 
OTime 
O Currency 
O Telephone numbers. 

How the Complex Variable is Played 

Figure 12 shows how the complex variable is played. Examples of different language syntax are: Number 
20 is expressed as <20> or <2><10> depending on the country. Date: 6/6/90 is expressed as: <6><19x90>; 
<Junex90>; <6xJune><1990>; <thex6thxofxJune><19x90>; <natk>nal symbol voicexyear number>; 
or <yearx6xmonth><6xday>. 

Time: 12 hour time or 24 hour time 
Currency $5.25 
O <5xdollars><25xcents> 
O <5xdoilarsxand><25xcents> 
O <5xyen><25><en> 

Telephone number 9861234 
O <9x8><6xpause><1x2x3x4> 
0<98x61><234> 

NLS Rules 

The language syntax dependencies are based on a set of NLS rules. NLS rules are created by the System 
Administration Facility (SAF) and loaded by the Node Manager into SYSPARMs. NLS routines use these rules 
to break down the complex variables into pieces that play the appropriate voice segment 

The NLS rule has a general form in a character array of: <rule meta character><qualifier><optional voice 
segment{s)>,... The <optional voice segment> can be used anywhere in the specification array except between 
the meta character and its qualifier. More than one <optk>nal voice segment> can be used in sequence. 

The NLS rules are stored in SYSPARMs and loaded by the Node Manager; one set is used per language. 
Figure 13 is a table depicting the NLS parameter numbers corresponding to the rules set forth below. 

NLS Rule Characters and Qualifiers 

55 

The following are the basic elements used in defining the format of a date, time, currency, or telephone 
number. This is the general form of all NLS specifications: 
<rule meta characterxqualiflerxvoice segid> .... 
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/ - meta character for words of numbers 
9 - <billion> for value of 1000,000,000 
8 - rf there is a single word for 1 00,000,000 
7 ... 1 , only if there is a word for the value d - day of the month 
5 1 - play as PP_VARID_NUMBERS (number) 

2 - play as PPJ/ARIDJ3RDINAL (order) 

3 - play as PP J/ARID J)AYSOFMONTH 
m - month of the year 

1 - play as PP_VARID_NUMBERS (number) 
10 2 - play as PP_VARID_ORDINAL (order) 

3 - play as PP_VARID_MONTHNAMES 
y - year 

1 - year played as single number 

2 - year played as century and decade 
15 3 - year played as decade only 

4 - use the national year 

w - day of the week; similar qualifier as d 
v<ULONG Segment ld> - Play Segment 
h - hour in the day 
20 1-24 hour format as number 

2-12 hour (AM/PM) format as number 
M - minutes 

1 play as PP_VARID„NUMBERS (number) 

2 play as PPJ/ARIDORDINAL (order) 
25 s - seconds 

1 play as PP_VARIDJMUMBERS (number) 

2 play as PP_VARID_ORDINAL (order) 

the following meta characters always have the qualifier '0' 
$ - currency unit (dollar) 
30 c - currency fraction (cents) 

D - play telephone number as digits 

g - play telephone number as number by grouping 

Variable Mapping Table 

35 

As mentioned before, variables such as numbers, dates, or telephone numbers are considered complex 
variables. Based on the corresponding playing rules, these variables are broken down into smaller pieces to 
play as prompts. This process is accomplished using the PlayPrompt action. To identify these primitive pieces, 
a variable mapping table is used. 
40 All primitive variables are located by using two keys: varjd and varjium. Varjd defines the primary data 
type, such as: PP_VARID_NUMBERS and PP_VARID_MONTHNAMES. Var_num is usually the numerated 
value within the type. For example, January has the var_num value of one. 

To apply this rule to ail 256 possible languages, an NLS variable mapping table is defined in the Host data 
base for each supported language. This table is organized with three columns: 
46 O Varjd -such as PP_VARID_MONTHNAMES (short) 

O Varjium -such as January, enumerated with short type 

O segid -voice segment id 

This table is loaded by the Channel Process as part of prompt directory loading for the current language 
used. The following list shows all the supported variable types: 
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VARID 



ITEM CONTE NTS 



PP_VARID_NUMBERS 



1 to 20,30,40,50,60,70,80,90 



PP_VARID_HTMB 



hund. thou. mill . bill . 



PP_VARIDJ10NTHNAMES 



Jan- Feb. Mar. . . 



PP_VARID_DAYSOFWEEK 



Sun , Mon . Tue . Wed . Thu . Fr i . S at 



PPJTARIDJDAYS0FMONTH 
PP_VARID_TIME0FDAY 



1st, 2nd, 3rd, 4th, . . ,31st 
am,pra,o f clock, hours 
yesterday , today * tomorrow 



PP„VARID„TIME0FWEEK 



PP_VARID_ALPHABET 



A,B,C,D,... 



PP__VARIDJI0ISE 
PP_VARID_SYS_MSGS 



noise type 

voice for system error 



messages 



PP_VARID_0RDINAL 



ordinal number 



NLS Design Assumptions and Limitations 

The following lists the current NLS design features: 
0 Number (example for American English) 

- Base 10 (no 24-2 dozens, base 12) 

- Only valid entry in NUMSPEC allowed (no 10) 

- Primitive Numbers are completed in VARMAP table (0,1,... 19, 20, 30 90, etc.) 

O Date/time inputs in UNIX format 

O Currency inputs are floating point number 

O Telephone numbers are always played as numbers 

O VARMAP Table 

- Completeness is expected from SAF 

- No redundancy such as 10 in NUMSPEC in American English or <biHk>n> is not used for <1000xmill- 
ion> British 

O NLS rule tables must be correct 

O No ordinal unit for <millionth> and <billionth>, VARMAP will have <hundredth> and <thousandth>. 
NUMBER SYSTEM: 

There are two operations to break a number so it can be used as a segment division and subtraction. VAG 
creates the number which allows the Channel Process to do these operations. The number formats assume a 
base 10, with the most significant digit on the left The range of numbers is from 0 to 4,294,967,295 (size of 
ULONG) which means that there are at the most 10 significant digits. This internal data structure is not seen 
by the VAG user, char numspec(60); 

The following are the basic elements used in defining the format of a number 
/ - meta character for words of numbers 
9 for <billion> 
6 for <million> 
3 for <thousand> 
2 for <hundred> 

v - followed by 4-byte voice segment ids 
Example of numspec generated Ion 

(1) American English 

^v<bi!lk>n>/ev<million>/3v<thousand>/2v<hundred> 

(2) British English 

/6v<million>/3v<thousand>/2v<hundred> 

This numbering scheme is applicable for both cardinal and ordinal number systems. The only difference 
between the two systems is using varjd for locating the voice segment There are two sets of the structure, 
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one for the cardinal number system, and another for the ordinal number system. 
DATE FORMAT: 

5 The order of the day, month and year along with their voice segment ids are given in this structure. Date 

Specification: 

- datefmt specifies the play order of day, month and year. If national year used, then SYSPARM 'natio- 
nal_year* contains the base year, 
d - day of month 
10 1 play as PP_VARID_NUMBERS (number) 

2 play as PP_VARID_ORDINAL (order) 

3 play as PP_VARID_DAYSOFMONTH 
m - month of year 

1 play as PP_VARID_NUMBERS (number) 
15 2 play as PP_VARID_ORDINAL (order) 

3 play as PP_VAR!D_MONTHMAMES 
y- Year 

1 year played as single number 

2 year played as century and decade 
20 3 year played as decade only 

4 use the national year 

w - Day of Week; similar qualifier as d 
v<ULONG Segment ld> - play segment 

The following example shows how the date, 6/6/90, is played. 



%y 25 
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specification played 
m3d2y2 <June><6th><19><90> 
m3d2y3 <June><6th><90> 
d2m3yl <6th><June><1990> 

<the>d2<of>rn3y2 <the><6th><of><June><19><90> 

<nsv>y4<year>ral<month>dl<day> 
is played as: 

<national symbol voice><year number><year><6><raonth><6><day> 



TIME FORMAT: 



The play order of time units and their voice segment ids are given in this structure. 
Tone specification 
45 char timefmt(20); 

specify the order of playing for time elements 
h - hour in the day 

1- 24 hour format as number 

2- 12 hour (AM/PM) format as number 
so M - minutes 

1 play as PPJ/ARID„NUMBERS (number) 

2 play as PP_VARID_ORDINAL (order) 
s - seconds 

1 play as PP_VARID_NUMBERS (number) 
55 2 play as PP_VARID_ORDINAL (order) 

v - if voice needed to insert in between, then next four bytes containing voice segment id. 
The following example shows how the time, 11:30:24, is played. 
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specification played 
hlMKand>s Ksecond> <ll><30><and><24><second> 
5 h2Ml <ll><30><am> 

CURRENCY SPECIFICATION: 

Two monetary units are allowed; they are dollars and cents. The amount is played as a number with the 
10 voice segments of monetary units to make up the currency. 

Currency specification 
char moneyfmt(20) ; /* how to play dollar/ cent */ 
f5 short dollar2cent; /* converting ratio */ 

- moneyfmt: this field defines the play order of the amount and their 
monetary unit voices. 

$ play the dollar amount 

0 no qualifier, always played as numbers 
c play the cent amount. 
0 no qualifier, always played as numbers 

v insert a voice in between, followed by its 4 bytes segment ID. 

- dollar2cent: conversion ratio from dollar to 
cent. For American 1 US dollar = 100 cents, and this value is 100. 

This is a SYSPARM. 

The following example illustrates how the amount of currency, $5.25, is played. 

specification played 

$<dollars>c<cents> <5><dollars><25><cents> 

$<dollars><and>c<cents> <5><dollars><and><25><cents> 

$<yen>c<fen> <5><yen><25><fen> 

calling sequence: short play,currency(f!oat money) return true if play is successful. The VAG provides all infor- 
mation except the voice segment id for dollars) and cent(s) which will be initialized by the Channel Processor. 

TELEPHONE SPECIFICATION: 

The play grouping method of a telephone number is given in this structure. Within the group, the telephone 
number is played as numbers char phonespec(20) = DODODO DODODO DODODODO calling sequence: short 
50 play_phone_number<char *phone) 
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use to specify the rule to play a telephone number 

phonespec: D, g, or v 
D - play as a digit, followed by a zero or additional D 1 s for 

additional digits 
0 no qualifier 

g - play as a group, followed by a zero or additional g* s 
0 no qualifier 

v - insert a voice between play. This must be followed by a 4 -byte 

voice segment id. 
T ' - used as a separator. 

Given phone number 4085546888: 

specification played 
D0D0D0 D0D0DO DODODODO 

<4><0><8><5><5><4><6><8><8><8> 
gOgOgO gOgOgO gOgOgOgO <408><554><6888> 

STATE TABLE MANAGER DESIGN 

The purpose of the State Table Manager is to provide state tables to the Session Manager program in to 
control the progression of a call. The State Table Manager is designed to receive messages from a request 
queue, send a DPRB to the Data Base Interface Manager requesting a state table, receive the notification that 
the state table has arrived, and notify the requestor. The program is designed to handle multiple requests, con- 
currently, and is configured to handle invalid requests and tables that cannot be retrieved. 

The Stale Table Manager program also responds to requests from the Node Manager to release 4K buffers 
and requests to invalidate state tables. This section describes the following: 

O Performance considerations 

O Resources 

O The interface for requestors of state tables 

O The pseudo code that defines the State Table Manager Design. 

The State Table Manager and the Prompt Directory Manager share a common table structure and control 
table design. Figure 14 is a block diagram showing the data flow including interfaces with other processes and 
the control table. 

COMMON ROUTINES DESIGN 

Because the control table structure is identical (except for the entry control structure), many of the functions 
required for both the State Table Manager and the Prompt Directory Manager are consolidated into routines 
that both managers call. 

The Control Table consists of a block of memory large enough to contain enough entries to allow for a run- 
time-determined number of tables in memory at one time. The memory for the table is allocated from the A1X 
memory pool during system initialization such that the table is contiguous memory. Semaphore operations are 
used to prevent more than one process from updating the control table at one time. 

VAG COMPONENT DESIGN SPECIFICATIONS 

This chapter contains the detailed design specifications forthe VAG internal components. As shown in Fig- 
ure 2, these components and their design specifications include: 
O VAG Performance Considerations 
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O VAG Resources 

O VAG Server Interface Specifications 

0 VAG Use of Motif and X-Windows 

O VAG Global Data Structures 
5 O VAG Specific Global Variables 

0 Front-End Design 

O Prompt Generator Design 

O State Generator Design 

O Voice Generator Design 
10 0 Application Manager Design 

0 VAG System Specifications 

O Utilities. 

The performance of the VAG components is subordinate to the call processing functions of the system. 
This requires the VAG components to operate at a lower priority than call processing tasks and to limit its com- 
15 petition for system resources with the rest of the VAE system. The VAG components are expected to operate 
during normal loading times without degrading system performance. 

VAG RESOURCES 

20 Total avaiable RAM memory and CPU processing power impact the performance of the VAG components. 

A large portion of the required memory is allocated for code and data structures for Motif and X-Windows. Both 
Motif and X-Windows are window interface managers for AIX. Future versions of these products are expected 
to provide shared libraries to reduce the memory requirements for each task. 

Both Motif and X-Windows make extensive use of allocated storage. Because of design decisions to trans- 

25 fer complete files to the VAG components instead of accessing individual records and due to the way that the 
Motif scrollable lists function, large amounts of memory will be allocated, dynamically, by the application. The 
amount of memory allocated will depend on the actual size of the data base files that are edited. 

The VAG programs are not expected to be locked like call processing tasks, but instead, operate as a stan- 
dard-demand paged virtual memory process. This reduces the demand on the system's RAM memory that is 

30 required to run the VAG components. The VAG components must release any resources shared with call pro- 
cessing as rapidly as possible, for example 4K buffer blocks. 

The displaying of graphics is CPU intensive because the X-Windows low graphics functions perform float- 
ing calculations. So, the floating point processing performance of the target platform has a marked impact on 
the display speed. 

35 

FRONT-END DESIGN 

The purpose of the Front-End design is to enable or disable a security access level based on the user's 
id and password. At the beginning of a work session, from the VAE top-level default screen, the Front-End 
40 design allows the user to select the Password pop-up box only. 

When the user enters his user id and password in the Password pop-up box and selects APPLY, the Front- 
end design allows the user to access only those menu that are assigned to him in the Administrator Profile. 
The password is not displayed as the user enters it. The enabling or disabling of the security access level is 
controlled by setting the sensitivity of specific menu buttons. Sensitivity also determines whether keystrokes 
45 or mouse actions can invoke the function associated with the button. 

PROMPT GENERATOR DESIGN 

The purpose of the Prompt Generator is to allow a system user to create, update and delete the prompts 
so executed by a channel process. The Prompt Generator provides a graphic user using Motif, to accomplish these 
tasks. The Prompt Generator creates, updates, and deletes prompts through a DPRB interface with the Data 
Base interface Manager. 

The Prompt Generator functions at three levels. The first level creates and updates prompt directories. 
Each prompt directory consists of prompt directory parameters and a list of prompts. The second level creates, 
55 updates, and deletes individual prompts in a single selected prompt directory. Each prompt consists of prompt 
parameters and a prompt body. The third level creates, updates, and deletes the references to the voice seg- 
ments, prompt variables, and tests that define the prompt body in a single selected prompt. 
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EDITING VOICE SEGMENTS 

To edit a voice segment, a user selects the voice editor text editor panel as shown in fiinction block 800 
of Figure 16. The user selects voices in function block 810 and is presented with the list of voice segments 
5 which are retrieved by the data base function. In function block 81 0, the user finds a segment of interest using 
search or scrolling, and using a mouse, clicks on a segment of interest and selects the "Mod Voice" by clicking 
on the selection area on the same panel as depicted in function block 830. 

The voice application enabler performs the following steps in preparation for editing: 
a. presents the Voice Editor Panel; 
10 b. requests the allocation of two voice channels on the voice hardware; 

c. places the channels in wrap mode; one for decompression and the other in 'clear channel" (i.e., uncom- 
pressed) mode; 

d. locates and writes the compressed voice segment to voice channel in decompression mode, and reads 
from the wrapped channel in dear channel mode. Both the compressed and decompressed voice segments 

15 are retained in memory (RAM); 

e. as it is received from the decompression process, the wave form of the decompressed voice segment 
is displayed in the voice editor panel; and 

f. the duration of the voice segment in playing seconds is calculated from the its physical length and dis- 
played on the editor panel. 

20 A user selects the "Set" action from the editor panel using the mouse as indicated in function block 850 in 
Figure 16. Also with the mouse, he marks the beginning and end of the digitized voice segment he desires to 
modify as highlighted in function blocks 860 and 870. The physical locations in the compressed and uncom- 
pressed voice segment are calculated and the MARK'd section is highlighted. These positions are always 
rounded to the nearest twenty ms. boundary, which in the current implementation are thirty-two and one-hun- 

25 dred- sixty bytes, respectively. The user can then, delete the marked area as shown in function block 880 and 
890 and save the resultant as depicted in function blocks 900 and 910. 

Alternatively, the user may play the marked portion of the segment by selecting "Play" on the action bar 
of the Editor Panel. In this case the voice is written to the voice hardware channel which is in clear channel 
mode and, in turn, is transferred to a headset or speaker. Playing the voice information allows the user to verify 

30 that the correct portion of the segment has been selected. 

The marked portion may be copied to another segment or deleted by selecting the appropriate actions with 
the mouse. If the marked portion is deleted, the voice editor rewrites the internal buffers of both the compressed 
and uncompressed versions of the voice segments, and rewrites the digitized wave form to the panel, resizing 
the time scale to fit the panel. 

35 If a user desires to copy a whole or portion of a segment into another segment the first segment is marked, 
and the user switches to the "Other Window" and INSERTS the COPY'd portion into a MARK'd location of the 
2nd segment. The segments are then re-written to their respective buffers and the wave-form panel is re-gen- 
erated from the uncompressed form of the data. 

When editing of the voice segment is complete, the user selects "Save" from the Action Bar. This action 

40 saves the edited compressed form of the segment by writing it in a file, thereby replacing the original. Thus, 
editing is accomplished without successive decompression and recompression of the stored voice data and 
without successive distortion in the voice as a result 

PLAYING VOICE PROMPTS 

45 

Preparation of the prompt 

To prepare a voice prompt the user starts the Voice Application Generator (VAG) by selecting "VAG" with 
the Mouse on the Motif Menu Bar as shown in function block 1000 of Figure 17. The user is presented with a 
so pull-down menu with 3 options: States, Prompts and Voices. Then, the user positions the cursor over the 
"Prompt" and presses the mouse button to signify selection of the function used to define the prompts he wishes 
to use for his voice application as shown in function block 1010. 

The user is presented with the Prompt Directory Editor Panel. The Prompt Directory Editor panel displays 
a list of ail the currently-defined prompt directory entries. By selecting "Options" the user can choose to display 
55 or work with alternative versions of the prompt which may have been defined in other languages. Then, the 
user selects "Add", as shown in function block 1020 from the Prompt Editor Panel. The input subpanel appears 
on the right of the Editor Panel. 

On the input subpanel, the user enters Prompt ID number and Purpose as shown in function block 1030, 
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then clicks on "Apply". Then, the user is presented with a list of voice segments available, and an expanded 
prompt list to the right of the list, and a selection of relationship operators as discussed earlier. The user selects 
a voice segment or variable, as shown in function block 1040, and optionally a conditional test which allows 
the referenced segment to be conditionally played depending on the value of the data in a defined variable (e.g., 
5 today's date or time, value entered by a caller, etc.). Complex IF. .THEN ..ELSE logic is also supported. 

if the segment selected is a variable, a pull-down menu is presented which defines the manner in which 
the data stored in the variable is to be vocalized (i.e., PLAY AS..). The user selects how the variable (segment) 
is be vocalized (e.g., digit, number, ordinals, currency, date, time, etc.) as shown in function block 1050. Vari- 
ables used as prompt segments are stored as a combination of the pointer to the variable data plus codes defi- 
10 ning how the data will be played. 

The user can continue the process discussed above, stringing prompt segments together, as shown in func- 
tion block 1060, untH the prompt is complete. Then, the user clicks on SAVE, saving the prompt for use in the 
application as depicted in function blocks 1065 and 1075. 

15 Playing Prompts with Complex Variables 

Application Script is started by a telephone call to the system as shown in function block 1080 of Figure 
18. The script references prompts defined during the application development process as shown in function 
block 1090. Upon execution of the "PLAY PROMPT" action in the script, as depicted in function block 1095, 

20 the VAE SESSION MANAGER steps through the list of segments defined by the prompt. Each type of segment 
(i.e., static voice or variable) is played according to its "Play AS" specification by passing control to a function 
defined for that "Play As" type. A flow diagram of this process is illustrated in Figure 15. 

Actual vocalization is performed by reference to system-level segment primitives (e.g., "hundred", "p.m.", 
etc.) which are derived from the rules established for each language available in the system as depicted in func- 

25 tion blocks 1100 to 1115. For example, currency and time are vocalized according to national custom. These 
primitives are selected, in effect, by parsing the value of the variable according to these rules. Upon completion 
of a prompt, control passes to the next script action. 

Lang age-In dependent Voice Applications 

30 

To define a new application in another language, the system administrator selects the NLS Editorfrom the 
main menu as depicted in Figure 19, funtion block 1200. The Editor panel is presented, with a list of currently- 
defined languages. English is the base (default) language. 

The user selects "Options - New Languages", as shown in function block 1210, and enters the name of 

35 the new language to be defined, as illustrated in function block 1220, and clicks on "OK". Then, the system 
copies the english-based language files into a shadow data base. The system then remaps the display keyboard 
by selecting "Keyboards" on the NLS Editor Panel. A list of available keyboard mappings is displayed in the 
Keyboard Selection Panel. The user can narrow the list by selecting "Filter" and the 2-digit code specifying the 
language of the keyboard of interest (e.g., French) as depicted in function block 1230. 

40 The user then edits the language-based file groups affecting the user (display) interface as depicted in func- 
tion block 1255 to 1300. These file groups include: 

(a) application developer standard interface terms; 

(b) common VAE user interface terms; 

(c) VAG Editors; 

45 (d) administrator profile; 

(e) mailbox terms; and 

(f) user profile. 

Then, the user edits language configuration parameters (NLS Rules) in the system configuration panel, as 
shown in function blocks 1310 to 1320, including: 
50 (a) variable mapping table; 

(b) number format; 

(c) telephone number format; 

(d) currency format; and 

(e) date format 

55 

Application Development (VAG) 

To develop the application after the language tailoring has been completed, the user selects "VAG" from 
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the VAE Sign-on Panel, as shown in Figure 20 at function block 1335. Then, the user selects one of three voice 
application development tools, as shown in function block 1345, from the pull-down menu: 

(1) "State Generator" to define application scripts; 

(2) "Prompt Generator* to define prompts to be invoked by the scripts; or 

5 (3) "Voice Generator* to record, playback, and edit voice segments used in the prompts. 

From the application development tool menu, the user selects "Options' 1 in the selected tool at function block 
1355, then "Language" in the pull-down menu at function block 1365. Then, the user selects language from 
the list of defined languages in the pull-down menu. The selected language will be used during the development 
session. The user can develop applications in herchoosed language, including scripts, prompt definitions, and 

10 voice prompt segments as shown in function block 1375. 



NLS Execution 



A script is prepared using VAG as discussed above and is invoked using appropriate signalling (i.e.,DID, 
15 SMSI, dedicated channel, eta) interfaces. The script executes using NLS rules for vocalizing prompts. The rules 
are table-driven and defined in the System Configuration section as discussed above. There is one set of rules 
per language. The set used is the one corresponding to the active language at the time of script execution. 

As discussed in greater detal above, the NLS general form is: 
<rule metaxqualifierxoptional voicesegments>... 
20 Rule meta characters and qualifiers include the following: 

Meta Char Qualifier Word 

/ 9 "Billion" 

25 8 100,000,000 

d 1 Play as number ("one") 

2 Play as ordinal ("first") 

3 Play as days of month 

eta, (there are meta characters for the year, the time, and for currency) 

Complex variables, such those above are, thus, broken down into smaller "primitive* pieces to play as indi- 
vidual words or phrases, according to the NLS rules. These primitives are identified according to a variable 
35 mapping table, for example: 

Variable IB Variable Value Segment ID 

MONTHNAME 1 Pointer to 
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January 



An execution logic NLS flow chart is provided in Figure 21. An incoming call is detected as suggested tin 
function block 1400. The detection invokes an appropriate script that generates the voice prompt playing as 
45 pointed out in function block 1410. To execute the script, the prompt definition and the NLS rules must be ret- 
rieved as illustrated in function blocks 1420 and 1425. This information allows the prompt generator to decode 
the PLAY-AS specifications and decode the complex voice information into primitives that can be played back 
to the caller as shown in function block 1435 and 1440. 



so 

Claims 

1. A method for editing compressed voice information, comprising the steps of: 
selecting a segment of compressed voice information; 
55 decompressing the compressed voice segment and displaying the decompressed voice information 

on a display; 

marking a portion of the displayed voice information; 

calculating the location and extent of a portion of the compressed voice segment corresponding to 
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said marked portion; 

editing the marked portion of the decompressed voice information; and 

correlating the editing actions from the decompressed voice information to the compressed voice 
information. 

Z A method as claimed in claim 1, the step of editing comprising deleting said marked portion from said 
decompressed voice information and the step of correlating the editing actions comprising deleting the 
corresponding portion of the compressed information. 

3. A method as claimed in claim 1 further comprising: 

selecting a second segment of compressed voice information; 

decompressing the second voice segment and displaying the second segment of decompressed 
voice information on a display; 

marking a location within the displayed second segment; and 

copying the marked portion from the first segment to said location in said second segment 

4. A method as claimed in any preceding claim wherein said marked portion may comprise the whole of said 
voice segment 

5. A method as claimed in any preceding claim wherein the step of decompressing said stored compressed 
voice information comprises: 

requesting allocation of two voice channels on voice hardware; 
placing the channels in wrap mode; 

locating and writing the compressed voice segment to the first allocated voice channel to decom- 
press the segment; and 

reading the decompressed voice segment from the second allocated voice channel. 

6. A method as claimed in any preceding claim, further comprising storing said edited compressed voice infor- 
mation on to a permanent record medium. 

7. Apparatus for editing compressed voice information, comprising: 

means for selecting a segment of compressed voice information; 

means for decompressing the compressed voice segment and displaying the decompressed voice 
information on a display; 

means for marking a portion of the displayed voice information; 

means for calculating the location and extent of a portion of the compressed voice segment corre- 
sponding to said marked portion; 

means for editing the marked portion of the decompressed voice information; and 
means for correlating the editing actions from the decompressed voice information to the compres- 
sed voice information. 

8. Apparatus as claimed in claim 7, the editing means comprising means for deleting said marked portion 
from said decompressed voice information and the correlating means comprising means for deleting the 
corresponding portion of the compressed information. 

9. Apparatus as claimed in claim 7 further comprising: 

means for selecting a second segment of compressed voice information; 

means for decompressing the second voice segment and displaying the second segment of decom- 
pressed voice information on a display; 

means for marking a location within the displayed second segment; and 

means for copying the marked portion from the first segment to said location in said second seg- 
ment 

10. Apparatus as claimed in any of claims 7 to 9, further comprising means for storing said edited compressed 
voice information on to a permanent record medium. 
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Field of the Invention 

The present invention relates generally to information services and in particular to user interfaces for in- 
formation services. 

5 

Background of the Invention 

Information services are widely used to provide access to and management of information or data. Ex- 
amples of information services include financial services, such as those used by individuals to purchase se- 
10 curities or transfer funds; database services, such as those used to store, search for and retrieve information; 
and telephone services, such as those used to identify and dial telephone numbers. Typically, a user interacts 
with an information service with the aid of a user interface. The interface may include audio and graphical 
features supported by an input/output (I/O) device, such as, for example, a personal computer, computer ter- 
minal, or telephone. 

15 Information service user interfaces are often described as tree-like in nature, having nodes and branches. 
The nodes of the tree represent explicit or implicit questions or requests ("requests") for information to be put 
to a service user. User responses to such requests allow an information service to determine the type of proc- 
essing and functions desired. For example, a service may request a stock name for which a price quote is 
sought by a user, or a telephone number which a user desires to dial. The branches of the tree represent paths 

20 between successive requests, or paths between a request and a function to be performed by the service. 

Information responsive to a request may be provided to an information service by any number of input 
techniques and associated devices. These include speech through a microphone, a keyboard or key-pad, a 
pen-like stylus, bar- code or magnetic media scanning, push-buttons, touch-screen technology, etc. Depending 
on the nature of the information service or the tasks required of the user, one or more of such techniques may 

25 be preferred over others. For example, voice entry of information may be preferred in some instances to speed 
and simplify information service operation for users. Voice entry may also be preferred because there is no 
alternative I/O device, or because of special needs of a user (e.g., due to a handicap). 

As a consequence of the nature or use of an input technique or its associated device, the content of in- 
formation received by an information service interface in response to a request may be subject to some degree 

30 of uncertainty. For example, in the form received from a microphone, the content or meaning of speech signals 
may not be recognizable by the information service; signals received from a stylus or bar code scanner may 
be corrupted in some fashion; or, more than one key on a keypad or element in a touch-screen system may 
be depressed accidentally. In each of these cases, the content of received information is uncertain. Prior to 
proceeding with service processing, the information service interface needs to address such uncertainties of 

35 received information content in the illustrative case of speech input, the information service interface must 
perform processing to recognize the content of spoken words such that the information will be in a form useful 
to the service. 

Summary of the Invention 

40 

The present invention provides a method and apparatus for resolving uncertainty in the content of infor- 
mation received as input to an information service. Resolution of uncertainty is provided by reference to a da- 
tabase containing likely responses to requests for information. A response is deemed likely based on an a priori 
probability that the response will be provoked by a given request A priori probabilities therefore indicate with 

45 what information a given user is likely to respond when presented with a given request. They may be deter- 
mined either quantitatively or qualitatively based on, among other things, the nature of the information service 
or experience with its use. 

Information of uncertain content received by the service interface is compared to the likely stored respons- 
es for the purpose of resolving the uncertainty. An illustrative embodiment of the present invention may perform 

so the comparison in any of several ways. For example, t he received information may be identif ied as the stored 
response to which it most closely compares based on a similarity metric. The received information may be ten- 
tatively identified as discussed above and an information service user be provided with a "right of refusal" of 
the identified information, to be exercised in the event that the a priori probable responses stored in the da- 
tabase do not provide for a reasonable resolution of the uncertainty. 

55 Furthermore, the received information may be identified, tentatively or otherwise, as the first stored en- 

countered response in the database (or portion thereof with which a comparison to the received information 
yields an acceptable measure of similarity. This technique may be used in conjunction with an ordering of likely 
responses in the database based on likelihood of use. 
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7. Level-based constraints that classify any part of the determining the security level of data. The security 
database depending on the security level of some constraints are used as deviation rules. The architecture 
data ^ for a query processor is shown in FIG. 2. This architect 

8. Fuzzy constraints that assign fuzzy values to their ture can be regarded as a loose coupling between a 
classifications; and 5 multilevel relational database management system and a 

9. Logical constraints that specify implications. deductive manager. The deductive manager is referred 
The method and apparatus disclosed are based on to as the query processor. It operates on-line. 

processing certain security constraints during query An update processor prototype is disclosed. The 
processing, certain constraints during database updates processor utilizes simple and content-dependent secu- 
and certain constraints during database design. FIG. 1 10 rity constraints as guidance in determining the security 
shows a schematic view of the integrated architecture. level of the data being updated. The use of security 
The two main tasks involved in constraint handling are constraints can thereby protect against users incorrectly 
constraint generation and constraint enforcement. The labelling data as a result of logging in at the wrong 
constraint generator takes the specification of the multi- level, against data being incorrectly labelled when it is 
level application and outputs the initial schema and the 15 imported from systems of different modes of operation 
constraints that must be enforced. The database design and against database inconsistencies as a consequence of 
tool takes this output as its input and designs the data- the security label of data in the database being affected 
base. The constraints and schema produced by the data- by data being entered into the database. The architec- 
base design tool are used by the update processor and ture for the update processor is shown in FIG. 3. This 
the query processor. Although the query processor, 20 architecture can be regarded as a loose coupling be- 
update processor and database design tool are separate tween a multilevel relational database management sys- 
modules, they all constitute the solution to constraint tem and a deductive manager. The deductive manager 
processing in multilevel relational databases; these three is referred to as the update processor. It can be used 
approaches provide an integrated solution to security on-line where the security levels of the data are deter- 
constraint processing in a multilevel environment In 25 mined during database inserts and updates, or it could 
the architecture shown in FIG. 1, the constraints and be used off-line as a tool that ensures that data entered 
schema which are produced by the constraint generator via bulk data loads and bulk data updates is accurately 
are processed further by the database design tool. The labelled. 

modified constraints are given to the constraint updater The security level of an update request is determined 
in order to update the constraint database. The schema 30 by the update processor as follows. The simple and 
is given to the MLS/DBMS to be stored in the content-dependent security constraints associated with 
metadatabase. The constraints in the constraint database the relation being updated and with a security label 
are used by the query and update processors. We as- greater than the user log in security level are retrieved 
sume that there is a trusted constraint manager process and examined for applicability. If multiple constraints 
which manages the constraints. In a dynamic environ- 35 apply, the security level is determined by the constraint 
ment where the data and the constraints are changing, that specifies the highest classification level. If no con- 
the query processor will examine all the relevant con- straints apply, the update level is the Logan security 
straints and ensure that users do not obtain unautho- level of the user. The update processor does not deter- 
rized data. mine the security level of the data solely from the secu- 

Constraints that classify an attribute or collection of 40 rity constraints, but utilizes the constraints as guidance 
attributes taken together are handled during the data- in determining the level of the input data* 
base design operation. These include the simple and In the disclosed apparatus and method, all constraints 
association-based constraints. When constraints are except for the release and aggregate constraints can 
processed during the database design operation, the theoretically be handled during the database update 
database design tool will determine the security levels 45 operation. When constraints are processed during the 
to be assigned to the database schema (i.e., the infonna- update operation, the update processor will compute 
tion about the data in the database). the security levels of the data being updated and ensure 

A query processor is disclosed that has the ability to that the data is stored at the appropriate level, 
handle all of the security constraints. Most users usually An MLS DBMS provides assurance that all objects 
build their reservoir of knowledge from responses that 50 in a database have a security level associated with them 
they receive by querying the database. It is from this and that users are allowed to access only the data which 
reservoir of knowledge that they infer unauthorized they are cleared. Additionally, it provides a mechanism 
information. Moreover, no matter how securely the for entering multilevel data but relies on the user to 
database has been designed or how accurately the data Logan at the level at which the data is to be entered, 
within is labeled, users could eventually violate security 55 The Update Processor will provide a mechanism that 
by inference because they are continuously updating can operate as a standalone tool with a MLS DBMS to 
their reservoir of knowledge as the world evolves. It is provide assurance that data is accurately labelled as it is 
not feasible to have to redesign the database or to have entered into the database. This could significantly en- 
to reclassify the data continuously. When constraints hance and simplify the ability of an MLS DBMS to 
are processed during the query operation, the query 60 assure that data entered via bulk data loads and bulk 
processor will compute the security levels of the data data updates is accurately labelled, 
before release and ensure that only the data at or before Another significant use for an update processor is in 
the user's level is released. operation with an Inference Controller which functions 

The processor (also called an inference controller) during query processing. The Inference Controller pro- 
protects against certain security violations via inference 65 tects against certain security violations via inference 
that can occur when users issue multiple requests and that can occur when users issue multiple requests and 
consequently infer unauthorized knowledge. The pro- consequently infer unauthorized knowledge. The Infer- 
cessor also uses security constraints as its mechanism for ence Controller Prototype also utilizes security con- 
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straints as its mechanism for determining the security 
level of data. The security constraints are used as deri- 
vation rules as they are applied to the data during query 
processing. Addressing ail of the security constraint 
types mentioned above could add a significant burden 
to the query processor particularly if the number of 
constraints is high. To enhance the performance of the 
query processor, the Update Processor can be utilized 
to address certain constraint types as data is entered into 
the database, in particular, simple and content-based 
constraints, alleviating the need for the query processor 
to handle these constraint types. We assume that the 
security constraints remain relatively static, as reliance 
on the Update Processor to ensure that data in the data- 
base remains consistent would be difficult, particularly 
in a volatile environment where the constraints change 
dynamically. An additional concern is that database 
updates could leave the database in an inconsistent state. 
The Update Processor, however, is designed to reject 
updates that cause a rippling effect and thus leave the 
database in an inconsistent state. 

A method and apparatus for handling constraints 
during database design is disclosed. The database design 
tool is shown in FIG. 4. The constraint generator takes 
the specification of the multilevel application and out- 
puts the initial schema and the constraints that must be 
enforced. The database design tool takes this output as 
its input and designs the database. The constraints and 
schema produced by the database design tool are used 
by the update processor and the query processor. 

BRIEF DESCRIPTION OF THE DRAWING 

FIG. 1 is a block diagram of the integrated architec- 
ture of the invention. 

FIG. 2 is a block diagram illustrating the query pro- 
cessor of the invention. 

FIG. 3 is a block diagram illustrating the update pro- 
cessor of the invention, 

FIG. 4. is a block diagram illustrating a multi-level 
database design tool. 

FIG. 5 is a block diagram illustrating constraint gen- 
eration and enforcement. 

FIG. 6 is a block diagram illustrating high-level ar- 
chitecture. 

FIG. 7 is a schematic illustration of a constraint struc- 
ture. 

FIG. 8 is a schematic illustration of the major mod- 
ules of the invention. 

FIG. 9 is a block diagram illustrating another high- 
level architecture. 

FIG. 10 is a block diagram illustrating the implemen- 
tation architecture according to the invention. 

PREFERRED EMBODIMENT 

The theoretical approach to security constraint pro- 
cessing is first presented. Then we give the implementa- 
tion details for the query processor, the update proces- 
sor and the database design tool. 

I. SECURITY CONSTRAINTS 

1.1 OVERVIEW 

Security constraints are rules which assign security 
levels to the data. They can be used either as integrity 
rules, derivation rules or as schema rules (such as data 
dependencies). If they are used as integrity rules, then 
they must be satisfied by the date in the multilevel data- 
base. If they are used as derivation rules, they are ap- 
plied to the data during query processing. If they are 
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used as data dependencies, they must be satisfied by the 
schema of the multilevel database. 

We have denned various types of security con- 
straints. They include the following: 
(i) Constraints that classify a database, relation or an 
attribute. These constraints are called simple con- 
straints. 

<ii) Constraints that classify any part of the database 
depending on the value of some data. These con- 
straints are called context-based constraints. 

(iii) Constraints that classify any part of the database 
depending on the occurrence of some real-world 
event. These constraints are called event-based 
constraints. 

(iv) Constraints that classify associations between 
data (such as tuples, attributes, elements, etc.). 
These constraints are called association-based con- 
straints. 

(v) Constraints that classify any part of the database 
depending on the information that has been previ- 
ously released. These constraints are called release- 
based constraints. We have identified two types of 
release-based constraints. One is the general release 
constraint which classifies en entire attribute de- 
pending on whether any value of another attribute 
has been released. The other is the individual re- 
lease constraint which classifies a value of another 
attribute depending on whether a value of another 
attribute has been released. 

(vi) Constraints that classify collections of data. 
These constraints are called aggregate constraints. 

(vii) Constraints which specify implications. These 
are called logical constraints. 

(viii) Constraints which have conditions attached to 
them. These are called constraint with conditions. 

(ix) Constraints that classify any part of the database 
depending on the security level of some data. 
These constraints are called level-based con- 
straints. 

(x) Constraints which assign fuzzy values to their 
classifications. These are called fuzzy constraints. 

We will give examples of constraints to each cate- 
gory. In our examples, we assume that the database 
45 consists of two relations SHIPS and ASSIGNMENT 
where SHIPS has attributes S#, SNAME, CAPTAIN, 
and A# (with A# as the key), and ASSIGNMENT has 
attributes A#, MISSION, and DESTINATION (with 
A# as the key). Note that A# in SHIPS and A# in 
ASSIGNMENT takes values from the same domain. 
The constraints may be expressed as some form of logi- 
cal rules. We have chosen horn clauses to represent the 
constraints. This way we could eventually take advan- 
tage of numerous techniques that have been developed 
for logic programs. 

Simple constraints: R(A1, A2, . . . An)— >Level(Ail, 
Ai2, . . . Ait)= Secret [Each attribute Ail, Ai2, . . . 
Ait of relation R is Secret]Example: SHIPS (S#, 
SNAME, CAPTAIN, A#)-*Level (CAP- 
TAIN)^ Secret 
Content-based constraints: R(A1, A2, . . . An) AND 
COND (Value (Bl, B2, . . . Bm))— Level (Ail, Ai2 
. . . Ait) = Secret [Each attribute Ail, Ai2, ... Ait 
of relation R is Secret if some specific condition is 
enforced on the value of some data specified by Bl, 
B2, . . . Bm]Example: SHIPS (S#, SNAME, 
WEIGHT, A#) AND (Value (SNAME) =- 
CHAMPION)— Level (CAPTAIN)= Secret. 
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Association-based constraints (also called context or straints. An example of a fuzzy constraint which 

together constraints): R(A1, A2, . . . An}->Levei is associated with a content-based constraint is 

(Together (Ail, AL2, . . . Ait)) = Secret [The attri- given below. R(A1,A2, . . . An) AND COND- 

butes Ail, Ai2, ... Ait of relation R taken together (VaIue(Bl,B2, . . . Bm))-*Level(Ail, Ai2, . . , 

are SecretJExample: SHIPS (S#, NAME, CAP- 5 Ait) = Secret and Fuzzyvalue=r (Each attribute 

TAIN, A#)->Level (Together (SNAME, CAP- Ail,Ai2, ... Ait of relation R is Secret with a 

TAIN)) —Secret. fuzzy value of r if some specific condition is 

Event-based constraints: R(A1, A2, . . . An) AND enforced on the values of some data specified by 

Event(E)-*Level (Ail, Ai2, . . . Ait)=Secret B1,B2, . . . Bm) 

[Each attribute Ail, Ai2, ... Ait of relation R is 10 Example: SHIPS(S#,SNAME,CAPTAIN, A#) 

Secret if event E has occurredjExample: SHIPS AND (Value( SNAME )= CHAMP I 0 N> 

(S#, SNAME, CAPTAIN, A#) AND Event >Level(CAPTAIN)=Secret and Fuz- 

(Change of President)-*Level (CAPTAIN), zyvalue=0.8. 
A#)— Secret. Complex constraints 

General release-based constraints: R(A1, A2, . . . An) 15 The examples of constraints that we have given above 
AND Release(Ai, Unclassified) CONG— *Level- are enforced on a single relations only. Note that con- 
(Aj)= Secret The attribute Aj of relation R is Se- straints can also be enforced across relations. We call 
cret if the attribute i has been released at the Un- such constraints complex constraints. An example is 
classified level] Example: SHIPS(S#, SNAME, given below: 

CAPTAIN, A#) AND Release(SNAME, Unclas- 20 R1(A1,A2, ... An) & R2(B1,B2, . . . Bm) & 
sified>^Level(CAPTAIN)=Secret. Rl.Ai=R2.Bj(l<i<n,i<j<m>-*Uvel(Toge- 
Individual release-based constraints: R(A1, A2, . . . ther(Ak,Bp))=Secret where l<k<n,l<p<m) 

An) AND Individual-Release(Ai, Unclassified)- This constraint states that pair of values involving the 
->Level(Aj)=Secret The individual release-based kth attribute of Rl and the pth attribute of R2 are Secret 
constraints classify elements of an attribute at a 25 provided the corresponding values (i.e. in the same 
particular level after the corresponding elements of . row) of the ith attribute of Rl and the j th attribute of R2 
another attribute have been released. They are are equal. 

more difficult to implement than the general re- 12 APPROACH TO SECURITY CONSTRAINT 
lease-based constraints. In our implementation, the PROCESSING 

individual release-based constraints are handled 30 Security constraints enforce a classification policy, 
after the response is assembled while all of the Therefore it is essential that constraints are manipulated 
other constraints are handled before the response is only by an authorized individual. In our approach con- 
generated, straints are maintained by the SSO. That is, constraints 
Aggregate constraints: Aggregate constraints classify are protected from ordinary users. We assume that 
collections of tuples taken together at a level 35 constraints themselves could be classified at different 
higher than the individual levels of the tuples in the security levels. However, they are stored at system- 
collection. There could be some semantic associa- high. The constraint manager, which is trusted, will 
tion between the tuples. We specify these tuples in ensure that a user can read the constraints classified 
the following form: R(A1,A2, . . . An) AND only at or below his level. 

Set(S,R) AND Satisfy(S,P)— *Level(S)= Secret 40 Our approach to security constraint processing is to 
This means that if R is a relation and S is a set handle certain constraints during query processing, 
containing tuples of R and S satisfied some prop- certain constraints during database updates and certain 
erty P, then S is classified at the Secret level. Note constraints during database design. The first step was to 
that P could be any property such as "number of decide whether a particular constraint should be pro- 
elements is greater than 10." 45 cessed during the query, update or database design 
Logical constraints: Logical constraints are rules operation. After some consideration, we felt that it was 
which are used to derive new data from the data in important for the query processor to have the ability to 
the database. The derived data could be classified handle all of the security constraints. Our thesis is that 
using one of the other constraints. Logical con- inferences can be most effectively handled, and thus 
straints are of the form: Ai=== = >Aj if condition C 50 prevented during query processing. This is because 
holds. This constraint can be instantiated as fol- most users usually build their reservoir of knowledge 
lows: The location of a ship implies its mission if from responses that they receive by querying the data- 
the location is the Persian Gulf. base. It is from this reservoir of knowledge that they 
Other constraints: infer unauthorized information. Moreover, no matter 
There are several other types of constraints which 55 how securely the database has been designed, users 
could be incorporated into our design fairly easily. could eventually violate security by inference because 
These include level-based constraints and fuzzy con- they are continuously updating their reservoir of 
straints. We describe them below. knowledge as the world evolves. It is not feasible to 
Level-based constraints: R(A1,A2, . . . An) AND have to redesign the database simultaneously. 
Level(Ai)=Unclassified ->Level(Aj)= Secret 60 The next step was to decide which of the security 
(The attribute Aj of relation R is Secret if the constraints should be handled during database updates, 
attribute Ai is unclassified) Example: After some consideration, we felt that except for some 
SHIPS(S#,SNAME,CAPTAIN,A#) AND types of constraints such as the release and aggregate 
Level( SNAME) =Unclassified->Level(CAP- constraints, the others could be processed during the 
TAIN)=Secret 65 update operation. However, techniques for handling 
Fuzzy Constraints: Fuzzy constraints are con- constraints during database updates could be quite com- 
straints which use fuzzy values. They can be plex as the security levels of the data already in the 
associated with any of the other types of con- database could be affected by the data being updated. 
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Therefore, initially our algorithms handle only the sim- be inferred from A using any of the inference 

pie and content-based constraints during database up- strategies and Level(X) is the security level of X. 

dates. We assume that any response that is released into a 

The constraints that seemed appropriate to handle knowledge base at level L is also released into the 
during the database design operation were those that 5 knowledge bases at level L*=L. The policy states that 
classified an attribute or collections of attributes taken whenever a response is released to a user at level L, it 
together. These include the simple and association- must be ensured that any user at level L*=L cannot 
based constraints. For example, association-based con- infer information classified at a level L-h > L* from the 
straints classify the relationships between attributes. response together with the knowledge that he has al- 
Such relationships are specified by the schema and 10 ready acquired. Note that while we consider only hier- 
therefore such constraint could be handled when the archical levels in specifying the policy, it can be ex- 
schema is specified. Since a logical constraint is a rule tended to include non-hierarchical levels also, 
which specifies the implication of an attribute from a set 2.1.2 FUNCTIONALITY OF THE QUERY PRO- 
of attributes, it can also be handled during database CESSOR 

design. 15 The strength of the query processor depends on the 

Note that some constraints can be handled in more type of inference strategies that is can handle. Our pro- 
than one way. For example, we have the facility to totype handles a limited set of inference strategies. Nev- 
handle the content-based constraints during query pro- ertheless it is a useful prototype which enhances the 
cessing as well as during database updates. However, it security of existing multilevel secure relational database 
may not be necessary to handle a constraint in more management systems. In this section, we discuss the 
than one place. For example, if the content-based con- techniques that we have used to implement the security 
straints are satisfied during the database update opera- policy. They are: query modification and response pro- 
tion, then it may not be necessary to examine them cessing. Each technique is described below, 
during query processing also. Furthermore, the query 25 Query modification 

operation is performed more frequently than the update Query modification technique has been used in the 
operation. Therefore, it is important to minimize the past to handle discretionary security and views. Stone- 
operations performed by the query processor as much braker, M., and E Wong, 1974j, "Access Control in 
as possible to improve performance. However, there Relational Database Management Systems by Query 
must be a way to handle all of the constraints during ^ Modification," Proceedings ACM National Confer- 
query processing. This is because, if the real-world is ence, New York, N.Y. This technique has been ex- 
dynamic, then the database data may not satisfy all of tended to include mandatory security in Dwyer, P., G. 
the constraints that are enforced as integrity rules, or Jelatis, B. Thuraisingham, Juen 1987, "Multilevel Secu- 
the schema may not satisfy the constraints that are pro- rity in Database Management Systems," Computers and 
cessed during database design. This means that there 35 Security, Volume 6, No. 3, pp. 252-260. In our design of 
must be a trigger which informs the query processor the query processor, this technique is used by the infer- 
that the multilevel database or the schema is not consis- ence engine to modify the query depending on the secu- 
tent with the real-world; in which case the query pro- rity constraints, the previous responses released, and 
cesser can examine the additional constraints. A sche- real world information. When the modified query is 
matic representation of the approach to constraint gen- 40 posed, the response generated will not violate security, 
eration and enforcement disclosed here is shown in Consider the architecture for query processing illus- 
FIG. 5. trated in FIG. 1. The inference engine has access to the 

2. DESIGN AND IMPLEMENTATION OF THE knowledge base which includes security constraints, 
QUERY PROCESSOR previously released responses, and real world informa- 

2.1 OVERVIEW 45 ti°n. Conceptually one can think of the database to be 

We first describe a security policy for handling infer- part of the knowledge base. We illustrate the query 
ences during query processing and then discuss our modification technique with examples. The actual im- 
implementation approach. plementation of this technique could adapt any of the 

2.1.1 SECURITY POLICY proposals given in Gallaire, H., and J. Minker, 1978, 

A security policy for query processing that we pro- 50 Logic and Databases, Plenum Press for deductive query 
pose extends the simple security property in Bell, D., processing. Our implementation is described in section 
and L. La Padula, July 1975, "Secure Computer Sys- 2.2. 

terns: Unified Exposition and Multics Interpretation," Consider a database which consists of relations 
Technical Report NTIS AD-A023588, The MITRE SHIPS and ASSIGNMENT where the attributes of 
Corporation to handle inference violations. This policy 55 SHIPS are S#,SNAME,CAPTAIN and A# with S# as 
is stated below.* ^ e ^5 m ^ the attributes of ASSIGNMENT are A#, 

Such a policy was fiist proposed in the LDV design [HONE87], MISSION and DESTINATION with A# as the key. 

1. Given a security level L,E(L) is the knowledge Let the knowledge base consist of the following rules: 
base associated with L. That is, E(L) will consist of 1. SHIPS(X,Y,Z,D) and Z=Smith-»Level(Y,Secret) 
all responses that have been released at security 60 2. SHIPS(X,Y,Z,A) and A=10-^Level(Y,Top Se- 
level L over a certain time period and the real cret) 

world information at security level L. 3. SHIPS(X,Y,Z, A)-*Level((Y,Z),Secret) 

2. Let a user U at security level L pose a query. Then 4. SHIPS(X,Y,Z,A) and Release(Z,Unclassified)- 
the response R to the query will be released to this — »Level(Y,Secret) 

user if the following condition is satisfied: 65 5. SHIPS(X,Y,Z,A) and Release(Y,Unclassified> 

For all security levels L* where L* dominates L, if — *LeveI(Z,Secret) 
(E(L*)UR)= = >X (for any X) then L* domi- 6. NOT(Levei(X,Secret) or Level(X,Top Secrets- 
nates Level(X). Where A= = >B means B can ->LeveipC,Unclassified) 
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The first rule is a content-based constraint which 
classifies a ship name whose captain is Smith at the 
Secret level. Similarly, the second rule is also a content- 
based constraint which classifies a ship name whose 
assignment number is 10 at the Top Secret level The 5 
third rule is an association-based constraint which clas- 
sifies ship names and captains taken together at the 
Secret level. The fourth and fifth rules are additional 
restrictions that are enforced as a result of the context- 
based constraint specified in rule 3. The sixth rules states 10 
that the default classification level of a data item is 
Unclassified. 

Suppose an Unclassified user requests the ship names 
in SHIPS. 

This query is represented as follows: 15 

SHIPS(X,Y,Z,A) 
Since a ship name is classified at the Secret level if either 
the captain is "Smith" or the captain name is already 
released at the Unclassified level, and it is classified at 
the TopSecret level if the assignment is "10" , assuming 20 
that the captain names are not yet released to an Unclas- 
sified user, the query is modified to the following: 

SfflPS(X,Y,Z,D) and Z^Smith and A^IO. 

Note that since query modification is preformed in 
real-time, it will have some impact on the performance 25 
of the query processing algorithm. However, several 
techniques for semantic query optimization have been 
proposed recently for intelligent query processing in a 
non-secure environment (see, for example, Minker, J., 
"foundations of Deductive Database," Morgan Kauf- 30 
man, 1988). These techniques could be adapted for 
query processing in a multilevel environment in order 
to improve the performance. 

Response Processing 

For many applications, in addition to query modifica- 35 
tion, some further processing of the response such as 
response sanitization may need to be performed. We 
will illustrate this point with examples. 

EXAMPLE ^ 

Consider the following release constraints discussed 
earlier. That is, 

(i) all ship names whose corresponding captain names 
are already released to Unclassified users are Se- 
cret, and ^5 

(ii) all captain names whose corresponding ship 
names are already released to Unclassified users are 
Secret. 

Suppose an Unclassified user requests the ship names 
first. Depending on the other constraints imposed, let us 50 
assume that only certain names are released to the user. 
Then the ship names released have to be recorded into 
the knowledge base. Later, suppose an Unclassified user 
(does not necessarily have to be the same one) asks for 
captain names. The captain name values (some or all) 55 
are then assembled in the response. Before the response 
is released, the ship names that are already released to 
the Unclassified user need to be examined. Then the 
captain name value which corresponds to a ship name 
value that is already released is suppressed from the 60 
response. Note that there has to be a way of correlating 
the ship names with the captains. This means the pri- 
mary key values (which is the S#) should also be re- 
trieved with the captain names as well as be stored with 
the ship names in the release database. 65 

EXAMPLE 

Consider the following aggregate constraint 
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Suppose an Unclassified user requests the tuples in 
SHIPS. The response is assembled and then examined 
to see if it has more than 10 tuples. If so, it is suppressed. 

There are some problems associated with maintaining 
the release information. As more and more relevant 
release information gets inserted, the knowledge base 
could grow at a rapid rate. Therefore efficient tech- 
niques for processing the knowledge base need to be 
developed. This would also have an impact on the per- 
formance of the query processing algorithms. There- 
fore, one solution would be to include only certain 
crucial release information in the knowledge base. The 
rest of the information can be stored with the audit data 
which can then be used by the SSO for analysis. 
2.2. DESIGN AND IMPLEMENTATION 
In section 2.2.1, we describe the various architectures 
that we considered for the implementation and the se- 
lected architecture. In section 2.2.2, we describe the 
representation of the constraints. In section 2.2.3, we 
describe the modules of the query processor. In section 
2.2.4, we discuss some other issues concerning our pro- 
totype. 

2.2.1 ARCHITECTURE COMPARISON 
Alternate Architectures 

We examined three different architectures for the 
implementation. A description of each architecture is 
given below. 

(i) In the first architecture, the database as well as the 
knowledge base is considered to be a set of Prolog 
clauses. Query processing would then amount to 
thereon proving. Many expert system have been 
developed using Prolog (see, for example, Merritt, 
D., 1989, Building Expert Systems in Prolog, 
Springer Verlag, New York). These systems take 
advantage of the backward chaining mechanisms 
provided by Prolog. In addition, several other 
reasoning mechanisms have also been implemented 
using Prolog. Implementing the query processor in 

Prolog, would produce a fairly powerful system.* 
the implementations described in [ROWE89] use such an architecture. 

(ii) The second alternative is to augment a relational 
database management system with a theorem 
prover implemented in Prolog. The advantages of 
augmenting a relational database system with an 
inference engine are discussed in Li, D., 1984, A 
Prolog Database System, Research Studies Press, 
John Wiley and Sons. Many commercial relational 
systems already have a Prolog interface. 

(iii) As the third alternative, we considered an archi- 
tecture where a multilevel relational database sys- 
tem was augmented with an inference engine. Such 
an architecture would be useful as the multilevel 
relational database system would ensure the en- 
forcement of a basic mandatory security policy. 
The inference engine then needs to implement only 
the policy extensions which are enforced in order 
to handle inferences. 

After ex aminin g the three architectures, we decided 
to select the third one. This was because we are inter- 
ested in handling security violations via inference for 
database systems which are already considered to be 
secure. Commercial multilevel relational systems are 
already available. Therefore, we feel that in order to 
produce a usefhl prototype we need to use such a sys- 
tem which will enforce the basic mandatory security 
policy. 

Implementation Architecture 
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Once we had settled on the architecture, the next task functions of each module is given below. We also iden- 
was to select a multilevel relational database system for tify the trust that must be placed on each process. 

- . , . • Af*«- : +u A T,««-js>ii* although operating system used in the implementation is not secure, our 

the implementation. After investigating the various des ign^umes a use of a multilevel secure operating system. 

systems that were available, we selected the Secure Process PI: The User Interface Manager 

SQL Server Sybase Inc. "Secure SQL Server/* 1989 for 5 ^ process ^ for passwor d and security level 

the foUowing reasons: f rom t he user. Since we assume that the operating sys- 

(i) for system was already available for our use, tem is secure> we re i y on the identification and authenti- 

(ii) we had prototyping experiences with the nonse- cation mec hanism provided by the operating system, 
cure version of SYBASE DataServer, Due t0 this f eaturej pi nee d not be a trusted process. It 

(in) the system provided the basic security features 10 ope rates at the user's level. PI accepts a query from the 
that we needed. user and performs syntax check. It then sends the query 

A detailed discussion on Secure SQL Server is given in t0 p r0C ess P2 and returns the response received from P2 
Rougeau, P., and E. Stearns, "The Sybase Secure Data- t0 ^ use[m j t t fc en waits in idle state for a request from 
base Server: A Solution to the Multilevel Secure the user. 

DBMS Problem," Proceedings of the 10th National 15 p r0C ess P2: The Central Inference Controller 
Computer Security Conference, Baltimore, Md., 1987. This process first sets up communication with PI It 
Note that Secure SQL Server enables the use of sixteen tnen waits in idle state for requests from PI. When a 
security levels numbered 1 through 16. The basic man- request arrives from PI, it logs into the database server 
datory security policy enforced is read at or below your ^ a t the user's level. It then requests process P3 (via 
level and write at your level. socket) to return applicable constraints. The query is 

A high level implementation architecture is shown in then modified based on the constraints (if any)* The 
FIG. 6. In this architecture, the Secure SQL Server is modified query is then sent to the MLS/DBMS. The 
augmented with an Inference Engine. We have stored response is then sent to process P4 for further process- 
the knowledge in the database. This way, the knowl- ^ m g. When P4 returns the sanitized response, a request is 
edge in the knowledge base can also be protected by the sent to process P5 to update the release database and the 
Secure DataServer. The Inference Engine does query response is given to PI. P2 then returns to idle state. If 
modification as well as response processing. constraints classified at a higher level are not processed 

We implemented the Inference Engine in "C" be- by P3 or if the response from the MLS/DBMS is first 
cause of the C programming language interface that 3Q given to P4 and P5 for sanitation and release database 
already exists for the Secure SQL Server. In the long- update, then P2 need not be a trusted processes. How- 
term, we envisage replacing such an Inference Engine ever, in our implementation, since P2 could have access 
with a more powerful logic-based theorem prover. to higher level information it must be trusted. 

2.2.2 REPRESENTATION OF SECURITY CON- Process P3: The Constraint Gatherer 
STRAINTS 35 This process first sets up socket for communication 

We assume that the constraints are maintained by the with P2 and then logs into the database server at sys- 
SSO. Constraints themselves could be classified at dif- tem-high. This is because P3 examines not only the 
ferent levels. However they are stored at system-high. security constraints classified at or below the user's 
The constraint manager, which is a trusted process, will level, but also higher level constraints. These higher 
ensure that a constraint classified at level L can only be 40 level constraints are examined to ensure that by releas- 
read by a user cleared at level L or higher. ing a response at level L, it is not possible for users 

The constraints are entered in a format that is a sim- cleared at a higher level to infer information to which 
plified version of the rules that we described in section they are not authorized. P3 builds and maintains the 
1. The constraints entered by the SSO are then pro- constraint table whenever the constraints are updated, 
cessed by a module of the Query processor and stored in 45 It waits in idle state for requests from P2. When a re- 
a graphical structure. We found this an efficient way to quest arrives, it builds a list of applicable constraints and 
represent the constraints. We have developed algo- sends the constraint structure to P2 and then returns to 
rithms to scan the graph structure in order to obtain the idle state. Since P3 maintains the security constraints it 
relevant constraints during query processing. The algo- is a trusted process, 
rithms also perform some optimization for efficiency. 50 Process P4: The Sanitizer 

The graph structure is illustrated in FIG. 7. The rela- This process sets up socket for communication with 
tions are combined to form a linked list. Each relation P2 and logs into the database server at system high. It 
has sixteen pointers emanating from it; one for each waits in idle state for a request to arrive from PI When 
security level. Associated with each level is a lined list a request arrives, which consist of the response and the 
of constraints. Each constraint has a set of attributes 55 applicable release constraints, it sanitizes the response 
that it classifies, constraint specific information such as based on the information that has previously been re- 
events and conditions, and a pointer to the next con- leased. It reads the release database maintained at van- 
straint Our implementation allows for the specification 00s levels in order to carry out the samtization process, 
of events and conditions which are quite complex. Each It then returns the sanitized response to P2 and returns 
constraint that is associated with a level classifies a set to idle state. Since response sanitation is a security 
of attributes at that level* Action, P4 must be trusted. 

An alternate representation of constraint is discussed in section 5. Process P5: The Release Database Updater 

2.2.3 MODULES OF THE QUERY PROCESSOR This process sets up communication with P2. It waits 
An overview of the major modules is shown in FIG. in idle state for requests from P2. When a request ar- 

8. The query processor consists of five modules PI 65 rives, it logs into the database server at all levels from 
through P5. Each module is implemented as an Ultrix system-high to the user's level and updates the release 
process.* The processes communicate with each other database at each level depending on the release con- 
via the socket mechanism. A brief overview of the straints for that level. Note that this is necessary only if 
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higher level constraints are examined by P3. If not, P5 
can log into the database server only at the user's level. 
After each update to the release database, it logs out of 
the database server at each leveL It returns a status 
message to P2 upon completion and returns to idle state. 5 

2.2.5 GENERAL DISCUSSION 

Over 8500 lines of C code have been implemented in 
the development of this prototype. We first developed 
the infrastructure of the query processor program. This 
involved the creation of the five processes and establish- 10 
ing the necessary communications between them. Some 
of these processes also had to log into Secure SQL 
Server at the appropriate security levels. The program 
was set up in such a way as to leave hooks for the easy 
addition of more features. Since this project is a prelimi- 15 
nary prototype of a system which could conceivably be 
extended in the future, we have tried to continue this 
approach of flexibility and modularity to make further 
expansion of the program easier. 

It appears that there is a noticeable performance deg- 20 
nidation when individual release constraints are han- 
dled. Large amounts of data need to be recorded. There 
are possibilities for optimization and this will be part of 
our future work. From the experiments that we have 
carried out so far, the performance impact of handling 25 
all of the other constraints is marginal. That is, there is 
hardly any visible difference between the execution 
times of the query processing strategy with or without 
the query processor. 

It should be noted that in our implementation we 30 
have assumed that the process P3 examines constraints 
not only classified at or below the user's level, but also 
the higher level constraints. That is, the higher level 
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constraints have an impact on the query modification. 
From the responses received a user may be able to infer 
the constraints at the higher level If on the other hand 
the higher level constraints are not processed by P3, the 
response may contain sensitive information. One way to 
overcome this problem is to analyze the constraints 
before they are enforced so that it is not possible for 
higher level constraints to have an impact on lower 
level query processing. More research needs to be done 
on constraint analysis. 
2.3 TEST SCENARIOS 

In this section we illustrate the processing of the 
query processor with some examples. 

Let the database consist of two relations SHIPS and 
GROUPS. The attributes of SHIPS are Number, Name, 
Class, Date, and Assignment Its primary key is Num- 
ber. The attributes of GROUPS are Number, Location, 
Mission, and Siop. Its primary key is Number. We as- 
sume that SHIPS-Assignment and GROUPS.Number 
take values from the same domain. Also, SHIPS. Assign- 
ment is a foreign key. The database is populated as 
shown below. To simplify the discussion we assume 
that both SHIPS and GROUPS are assigned level 1. 
Furthermore, all of the tuples are also stored at level 1. 
Note that the usual DOD classification levels do not 
exist as such in the secure DBMS that we have used. We 
assume that the number I denotes the Unclassified level, 
the number 10 denoted the Secret level, and the number 
16 (which is system-high) denoted the TopSecret level, 
and 10=secret, 16=top secret, i.e. system high. The 
user is assumed to be logged in at level 1. The table 
#filter temp 1 is a temporary work table used to store 
the result 



Relation SHIPS 



Number 


Name 


Class 


Date 


Assignment 


CVN68 


Nimitz 


Nimitz 


May 75 


003 


CV67 


John F Kennedy 


John F Kennedy 


Sep 68 


001 


BB 61 


Iowa 


Iowa 


Feb 43 


003 


CG47 


Ticonderoga 


Ticondeioga 


Jan 83 


005 


DD 963 


Spraance 


Sproance 


Sep 75 


006 


AGF3 


LaSaile 


Converted Raleigh 


Feb 64 


003 


WHEC715 


Hamilton 


Hamilton 


Feb 67 


003 


FFG7 


Oliver Hazard Perry 


Oliver Hazard Perry 


Dec 77 


001 


FF1052 


Knox 


Knox 


Apr 69 


001 


LSD 36 


Anchorage 


Anchorage 


Mar 69 


009 


LHA 1 


Tarawa 


Tarawa 


May 76 


003 


MCM 1 


Avenger 


Avenger 


Sep 87 


003 


AOR1 


Whichita 


Whichita 


Jun69 


003 


AFS 1 


Mars 


Mars 


Dec 63 


001 


AE2I 


Suribachi 


Suribachi 


Nov 56 


009 


AE23 


Nitro 


Nitro 


May 59 


005 


AO 177 


New Cimarron 


New Cimarron 


Jan SI 


001 


SSN706 


Albuquerque 


Los Angeles 


May 83 


006 


CVN65 


Enterprise 


Enterprise 


Nov 61 


009 


MS0427 


Constant 


Aggressive 


Sep 54 


001 






Relation Groups 






Number 


Location 


Mission 




Siop 


001 


North Atlantic 


naval exercises 




001 


002 


South Atlantic 


faUdands patrol 




002 


003 


Mediterranean 


iraq crisis 




006 


004 


Philippines 


stabilize government 




005 


005 


Persian Gulf 


iraq crisis 




004 


006 


Indian Ocean 


naval exercises 




004 


007 


North Sea 


soviet reconnaissance 


003 


008 


North Atlantic 


oceanographic research 


003 


009 


North Pacific 


oceanographic research 


001 



TEST SCENARIO 1: No Constraints 
Constraints active: NONE 
Original query: select * from Ships 
User's level: 1 

Final modified query: Same as the original query (that is, query is not modified) 
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-continued 

select ships.number, ships-name, shipsxlass, ships.data, ships^ssignment 
into #fiiter_templ from ships 

Note that the asterisk is a wildcard indicator which means the query is for 
ail attributes (fields) in a record. When the Inference Engine sees this 
character it replaces it with ail the field names in any tables specified in the 
from clause. 

Result All of the tuples in SHIPS 
TEST SCENARIO 2: Content constraints 

Constraints active: 

shipsxlass = 'Belknap' -» Level(shipsxlass) = 16; 
shipsxlass = Ticonderoga* — «■ Level(shipsxiass) — 16; 
shipsxlass = 'Leahy' Level(ships.class) - 16; 
ships.class = 'Charles F Adams' Level(shipsxlass) = 16; 
shipsxlass = 'Ohio* Level(shipsxlass) = 16; 
shipsxlass = 'Spruance* Level(ships.class) = 16; 
ships.class = Iowa' Level(smpsxlass) = 16; 
shipsxlass = 'Aggressive' Levei(ships.class) - 16; 
shipsxlass — 'Mars* Level(ships.class) — 16; 
shipsxlass = 'Nimitz' Level(shipsxlass) = 16; 
shipsxlass — 'Los Angeles' -* Level(shipsxlass) = 16; 
shipsxlass — 'John F Kennedy* Level(shipsxlass) = 16; 
shipsxlass = 'Enterprise 1 Level(ships.class) = 16; 
shipsxlass — 'Anchorage* — ► Level(ships.class) = 16; 
Original query: select * from ships 
User's level: 1 
Final modified query: 

select ships.nuraber, ships.name, shipsxlass, ships.date, ship&assignment 

into #filter tempi from ships where 

(not (shipsxlass — 'Belknap 1 )) and 

(not (shipsxlass = Ticonderoga*)) and 

(not (shipsxlass = 'Leahy')) and 

(not (shipsxlass = 'Charles F Adams')) and 

(not (shipsxlass — 'Ohio')) and 

(not (shipsxlass — 'Spruance')) and 

(not (shipsxlass = 'Iowa')) and 

(not (shipsxlass = 'Aggressive')) and 

(not (shipsxlass = 'Mars')) and 

(not (shipsxlass = 'Nimitz')) and 

(not (shipsxlass =* 'Los Angeles')) and 

(not (shipsxlass = 'John F Kennedy')) and 

(not (shipsxlass = 'Enterprise')) and 

(not (shipsxlass = 'Anchorage')) 

Result: 



Number 


Name 


Class 


Date 


Assignment 


AGF3 


La Salle 


Converted Raleigh 


Feb 64 


003 


WHEC715 


Hamilton 


Hamilton 


Feb 67 


003 


FFG7 


Oliver Hazard Perry 


Oliver Hazard Perry 


Dec 77 


001 


FF1052 


Knox 


Knox 


Apr 69 


001 


LHA1 


Tarawa 


Tarawa 


May 76 


003 


MCM 1 


Avenger 


Avenger 


Sep 87 


003 


AORI 


Whichita 


Whichita 


Jun69 


003 


AE21 


Suribachi 


Suribachi 


Nov 56 


010 


AE23 


Nitro 


Nitro 


May 59 


005 


AO 177 


New Cimarron 


New Cimarron 


Jan 81 


001 



TEST SCENARIO 3: Logical Constraints 
Constraints active: 

Logical(groups.location -* groups.missiori); 
LeveKgroups-mission) = 16 

Original query: select ships.name, groups-location, groups.siop from ships, 
groups where ships.assignment = groups. number 
Final modified query: 

select ships-name, groups.siop into #fUtcr_ tempi from ships, groups where 
ships^ssignment = groups.number 



Results: 

Name Siop 



Nimitz 006 

John F Kennedy 001 

Iowa 006 

Ticonderoga 004 

Spruance 004 

La Salle 006 

Hamilton 006 

Oliver Hazard Perry 001 

Knox 001 

Anchorage 001 

Tarawa 006 

Avenger 006 

Whichita 006 

Mars 001 

Suribachi 001 
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-continued 



Nitro 004 

New Cimarron 001 

Albuquerque 004 

Enterprise 001 

Constant 001 



TEST SCENARIO 4: Association Constraint (or Together Constraint) 
constraints active: 

Level(Together(groups.mission, groups-location)) — 10 
Original query: select * from groups 
User's level: I 
Final modified query: 

select groups .number, groups.location, groups.siop into #filter_templ from 

groups 

Results: 



Number 


Location 


Siop 


001 


North Atlantic 


001 


002 


South Atlantic 


002 


003 


Mediterranean 


006 


004 


Philippines 


005 


005 


Persian Gulf 


004 


006 


Indian Ocean 


004 


007 


North Sea 


003 


008 


North Atlantic 


003 


009 


North Pacific 


001 



TEST SCENARIO 5: Content and Logical Constraints 
Constraints Active: 

Logical(Groups,Mtssioa -+ Groups.Location) 
Groups.Location = Persia Gulf Level<Groups.Location) = 16 
Original query: select * groups 
Final modified query: 

Select groups.number, groups .location, groups.mission, groaps.siop into 
#rllter_templ from groups where (not(groups.location = 'Persian Gulf)) 
Number Location Mission Siop 



001 


North Atlantic 


naval exercises 


002 


002 


South Atlantic 


falfclands patrol 


002 


003 


Mediterranean 


iraq crisis 


006 


004 


Philippines 


stabilize government 


005 


006 


Indian Ocean 


naval exercises 


004 


007 


North Sea 


soviet reconnaissance 


003 


008 


North Atlantic 


oceanographic research 


003 


009 


North Pacific 


oceanographic research 


001 



TEST SCENARIO 6: Release Constraint 

Constraints active: Release(ships^ssignment:l) -» Levei(ships.name) = 10 
(i.e. if ships.assignment is released at level I, then ships.name is classified 
at level 10) 

Original query: select * from ships 
User's level: 1 

Results released previously were cleared before executing this query. 
Release constraint triggered by the release of: 
shtps.assignment at level 1, ships.name can't appear in query. 
Final modified query: 

select ships.number T shipsxlass, ships-date, ships.assignment into 

#filter_ tempi from ships 

Result: 



Number 


Class 


Date 


Assignment 


CVN68 


Nimitz 


May 75 


003 


CV67 


John F Kennedy 


Sep 68 


001 


BB61 


Iowa 


Feb 43 


003 


CG47 


Ticonderoga 


Jan S3 


005 


DD963 


Spraance 


Sep 75 


006 


AGF3 


Converted Raleigh 


Feb 64 


003 


WHEC715 


Hamilton 


Feb 67 


003 


FFG7 


Oliver Hazard Perry 


Dec 77 


001 


FF1052 


Knox 


Apr 69 


001 


LSD 36 


Anchorage 


Mar 69 


010 


LHA 1 


Tarawa 


May 76 


003 


MCM1 


Avenger 


Sep 87 


003 


AOR 1 


Whichita 


Jun69 


003 


AFS 1 


Mars 


Dec 63 


001 


AE21 


Suribachi 


Nov 56 


010 


AE23 


Nitro 


May 59 


005 


AO 177 


New Cimarron 


Jan 81 


001 


SSN706 


Los Angeles 


May 83 


006 


CVN 65 


Enterprise 


Nov 61 


010 


MSO 427 


Aggressive 


Sep 54 


001 


Release Table contents: 






Name 




Level 




ships.number 




1 
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-continued 

shipsxlass 1 
ships.datc 1 
ships.assignment 1 

TEST SCENARIO 7: Aggregate Constraint 

Constraints active: Aggregate<10) Level(ships.name) = 12; 

Original query: select * from strips 

User's level; 1 

Final query: Same as original query 

select ships.number, ships.name, shipsxlass, ships,date, ships.assignment 
into #fdter„ tempi from ships 

Result: No result returned for the query since more than 10 ship names 

would have been returned. 

TEST SCENARIO 8: Aggregate Constraint 

Constraints active: Aggregate(10) — ► Level(ships.name) — 12; 

Original query: select * from ships where number like '% CV %* 

final query as modified 

select ships.number, ships.name, ships.class, ships.date, ships^ssignment 
into #filter— tempi from ships were number like '% CV %* 
Result: 

Numbe r Name Class Date Assignment 

CVN 68 Nimitz Nimitz May 75 003 

CV 67 John F Kennedy John F Kennedy Sep 68 001 

CVN 65 Enterprise Enterprise Nov 61 010 



3 DESIGN AND IMPLEMENTATION OF THE 
UPDATE PROCESSOR 25 
3.1 OVERVIEW 

MLS/DBMSs ensure the assignment of a security 
level to data as data is inserted or modified. The security 
level assigned to the data, however, is generally as- 
sumed to be the login security level of the user entering 30 
the data. A more powerful and dynamic approach to 
assigning security levels to data is through the utiliza- 
tion of security constraints, or classification rules, dur- 
ing update operations. This section provides an over- 
view of the functionality and utilization of a tool, the ^ 
Update Processor, that utilizes security constraints as its 
mechanism for detennining the security level of data 
being inserted or modified. Descriptions of the security 
policy and of the types of security constraints addressed 
by the Update Processor are also included. ^ 

3.1.1 SECURITY POLICY 

The security policy of the Update Processor is formu- 
lated from the simple security property in Bell, D., and 
L. La Padula, July 1975, "Secure Computer Systems: 
Unified Exposition and Multics Interpretation," Tech- „ 
nical Report NTIS AD-A023588, The MITRE Corpo- 
ration and from a security policy provided by our un- 
derlying MLS DBMS, SYBASE'S Secure SQL Server. 
This policy is as follows: 

1 . All users are granted a maximum clearance level. A 5Q 
user may log in at any level that is dominated by his 
maximum clearance level. Subjects act on behalf of 
users at the user's login security level. 

2. Objects are the rows, tables, and databases, and 
every object is assigned a security level upon ere- 55 
ation. 

3. A subject has read access to an object if the secu- 
rity level of the subject dominates the security level 
of the object. 

4. A subject has write access to an object if the secu- ^ 
rity level of the object dominates the security level 

of the subject. 
Statements 3 and 4 of the policy presented above are 
the simple and * -property of the Bell and LaPadula 
policy. Since the Secure SQL Server by default polyin- 65 
stantiates with updates, we are utilizing the more re- 
laxed security policy offered by the Secure SQL Server. 
This less strict security policy is provided via the relax- 
ation property option. The relaxation property does 



polyinstantiate with inserts, does not polyinstantiate 
with updates and allows users to delete tuples which 
their login security level dominates. More deraris on the 
security policy of the Secure SQL Server are provided 
in Rougeau, P., and E, Stearns, "The Sybase SEcure 
Database Server: A Solution to the Multilevel Secure 
DBMS Problem," Proceedings of the 10th National 
Computer Security Conference, Baltimore, Md., 1987. 

3.1.2 FUNCTIONALITY OF THE UPDATE 
PROCESSOR 

The Update Processor utilizes simple and content- 
dependent security constraints as guidance in determin- 
ing the security level of the data being updated. The use 
of security constraints can thereby protect against users 
incorrectly labelling data as a result of logging in at the 
wrong level, against data being incorrectly labelled 
when it is imported from systems of different modes of 
operation such as a system high, and against database 
inconsistencies as a consequence of the security label of 
data in the database being affected by data being entered 
into the database. 

The security level of an update request is determined 
by the Update Processor as follows. The simple and 
content-dependent security constraints associated with 
the relation being updated and with a security label 
greater than the user login security level are retrieved 
and examined for applicability. If multiple constraints 
apply, the security level is determined by the constraint 
that specifies the highest classification level. If no con- 
straints apply, the update level is the login security level 
of the user. The Update Processor, therefore, does not 
determine the security level of the data solely from the 
security constraints, but utilizes the constraints as guid- 
ance in determining the level of the input data. The 
following examples illustrate the functionality of the 
Update Processor. 

Consider a database that consists of a relation SHIPS 
whose attributes are number, name, class, date, and 
assignment, and number as its primary key. The con- 
tent-based constraint which classifies all ships with 
name Georgia as secret is expressed as: 
SHIPS.name= "Georgia" -^Secret. 
A user at login security level confidential enters the 

following data to insert a tuple into the SHIPS relation: 
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Insert SHIPS values ("SSBN 729", "Florida", "Ohio", assure that data entered via bulk data loads and bulk 
"Feb 84", "008"). The Update Processor will receive data updates is accurately labelled, 
this insert and retrieve the constraints associated with Another significant use for an Update Processor is in 
the SHIPS relation which specify a level greater than operation with an Query processor which functions 
the user level, which is confidential, and whose level is 5 during query processing. The Query processor protects 
less than or equal to the user level. The content-based against certain security violations via inference that can 
constraint stated above is retrieved. Since the data en- occur when users issue multiple requests and conse- 
tered for the name field is not "Georgia", the security quently infer unauthorized knowledge. The Query pro- 
constraint associated with the SHIPS relation will not cessor Prototype also utilizes security constraints as its 
affect the classification level of the insert, and the Up- 10 mechanism for deterniining the security level of data, 
date Processor will determine the insert level to be the The security constraints are used as derivation rules as 
user level, which is confidential. they are applied to the data during query processing. 

Suppose a user at login security level confidential Addressing all of the security constraint types men- 
then enters the following: Insert SHIPS values ("SSBN tioned above could add a significant burden to the 
730", "Georgia", "Ohio", "Mar 89", "009"). The Up- 15 query processor particularly if the number of con- 
date Processor will again retrieve the content-based straints is high. To enhance the performance of the 
constraint associated with the SHIPS relation, which query processor, the Update Processor can be utilized 
spedties a level greater than the user level and whose to address certain constraint types as data is entered into 
level is less than or equal to the user level. Since the data the database, in particular, simple and content-based 
for the name field is "Georgia", the Update Processor 20 constraints, alleviating the need for the query processor 
will determine the insert level to be secret If, however, to handle these constraint types. We assume that the 
the user entered this insert at login security level top security constraints remain relatively static, as reliance 
secret, the Update Processor would perform the insert on the Update Processor to ensure that data in the data- 
at the user level since the user level is higher than the „ *>zsq remains consistent would be difficult, particularly 
level specified by the security constraint 25 » * volattic environment where the constraints change 

The update operation of the Update Processor func- dynamically. An additional concern is that database 

tions simulriy to the insert operation. As an example, *P dates could I leave the databases an ^consistent state, 

suppose a user at the confidential level enters the fol- Thf Update Processor, however, is designed to reject 

low: Update SHIPS set name= "Georgia" where aP^** that cause a rippling effect and thus leave the 

class= 'Ohio", the Update Processor will retrieve the da f^^ 

security constraints associated with the SHIPS relation f f^^^^^Y CON 

which specify a level greater than the user level and q ^ ; t SSo SECURITY CON- 

whose level is less than or equal to the user level. The S T^*v\ , „ - « . , , 

content-dependent constraint stated above will be re- 35 ^ ™f u P date P roc*; shandies the smiple and con- 

trieved, an the Update Processor will determine the tent-based constraints. While the graph structoe repre- 

update level to be secret since the name field is being sent f on » sect * on « ^f^^"^ 

modifiedto "Georgia". Tlietuplewim a primary key of of constraints, we felt that it would be more 

"SSBN 729" as defined above will then be updated at ^f nt ^ store a large number of constraints in the 

the secret level, and the original tuple will be deleted. 40 database^ Therefore, for the update proctor prototype 

In addition to describing the functionality of the Up- *° we decided to store the consents m the database. As 

date Processor, the examples above illustrate the poten- befo f e the constraints were clawed at different secu- 

tial signaling channels that exist when operating with *** ** sto ff* f T^n 

the Update Processor. A signaling channel is a form of ^^ ato ^^^?? cm 

covertXnnel which occur^ when the actions of a high 45 mampulate the constramt table. The constraint man- 

u- * • * ^ o w„ ^ e „k™t o ager, which is a trusted process, would ensure that only 

user or subject mt ««» wrtfc ^^^^^^ a user classified at level L could read the constraints 
visib e manner. Potential signahng channels occur when 

data is enter at a level higher than the user level and the M iC 

user attempts to retrieve the data that he has entered, or TABLE 1 

when the Update Processor attempts to enter data at a 50 constrain t Table 
higher level, but cannot since a tuple with the same 
primary key already exists at this level We will discuss 
the potential signaling channels that could occur oper- 
ating with the Update Processor and our solutions in 
Section 3.2.5. 55 

3.1.3 UTILIZATION OF THE UPDATE PRO- 
CESSOR 

An MLS DBMS provides assurance that all objects 
in a database have a security level associated with them 

and that users are allowed to access only the data which 60 The CONSTRAINT table, populated with example 

they are cleared. Additionally, it provides a mechanism constraints, is presented in Table 1. The definition of the 

for entering multilevel data but relies on the user to field names follows. CONSTRAINT. c_id is the pri- 

login at the level at which the data is to be entered. The mary key for the table and contains a unique constraint 

Update Processor will provide a mechanism that can identifier. CONSTRAINT.c_level is the constraint 

operate as a standalone tool with a MLS DBMS to 65 level. Only data entered by users with a login security 

provide assurance that data is accurately labelled as it is level at or above this constraint level will be affected by 

entered into the data base. This could significantly en- this constraint. CONSTRAINT.result_rel_name_id is 

nance and simplify the ability of an MLS DBMS to the relation name associated with the constraint. CON- 



C_ID 


C_LEVEL 






RESULT__REL_NAME 


CONDITION 


RESULT-LEVEL 


1 6 


class = 


10 


SHIPS 




"Georgia 






2 6 


name = 


11 


SHIPS 




"Florida" 






3 6 


name ~ 


12 


SHIPS 




"Georgia" 






4 6 


1 = 1 


S 


SHIPS„CLASS 
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STRAINT.condition is the expression of the condition examine all the constraints. The CONSTRAINT.result- 

for a content-based constraint, and CONSTRAINT- -level of the first constraint that applies will be the 

.result -evel is the level specified by the constraint. An insert level determined by the constraints. Following 

additional field which we recommend adding to the the retrieval from the CONSTRAINT table, the initial 

CONSTRAINT table is a CONSTRAINT.status field 5 insert request is inserted into an empty temporary table, 

to indicate whether the constraint is currently active or a select statement is then built using the temporary 

inactive. The capability to change the status of con- table as the relation and the condition from the first 

straints is particularly useful for an application whose applicable constraint as the where clause. If this select 

constraints change dynamically. statement successfully retrieves the row in the tempo- 

3.2.2 ASSUMPTIONS 10 rary table, then the constraint applies. The CON- 
In implementing the Update Processor, the following STRAINT.result level for this constraint is the level at 

assumptions were made. Examples are given for clanfi- wMch the Update Processor will request the insert to 

cation when necessary. the Secure SQL Server. 

1 . Users can only update tuples they can see If a user ^ however, the select statement does not retrieve the 
updates a tuple that exists at his login security level and 15 fOW ^ ^ t m ^ t e mporary table is de- 
the Update Processor determines the update security j ^ ^ ^ for ^ next }icable 
level to be rngher than the users login security level, If the 6 al g orithm^mplete and nocoitraints 
the Update IJocessor ^V^^^J^ apply, then the insert level is determined to be the user 
higher level. However, if a tuple with the same primary '- tv * « 

key already exists at this higher level, the update request 2 Q *r secuni * iev f * t „ 

wulbereje^ed,astheuser^woddinfact^u^ An example of the msert algonthm is as follows 

mplewhc^securitylabeiisgreaterthanhisloginsecu- P°™ der th f *«* requested by a user at 

rity level. The Update Processor will return a request secunty level 6 on the ships datobase that has 

failed message to the user. We recommend that a re- defined to lt th * constraints as specified in the CON- 

quest of this type be audited and that an SSO be alerted 2 5 STRAINT teble m Table 1 : 

to resolve theconflict. SHIPS values ("SSBN 730", "Georgia" "Ohio", 

2. An update request will be aborted if it leaves the " Feb 84 M » "009"). 

database in an inconsistent state. This may occur with Three constraints are retrieved from the CON- 
the existence of more complex constraints on multiple STRAINT table in the order CONSTRAINT.- 
relations. As an example: Given the constraint which 30 c_id-"3", CONSTRAINT.c~id-"2", CON- 
references the SHIPS and SHIPS-CLASS tables, STRAINT.c_id— 'T\ The insert request is then modi- 
SHIPS CLASS.length="20"— »Level (SHIPS.- fied to allow the data to be inserted into an empty tern- 
name) =9. If the SHIPS CLASSJength field is updated porary table that has the same schema as the SHIPS 
to be equal to "20," then data in the SHIPS table where table. The temporary table, #insert_temp,is created 
SHIPS_CLASS.classification==SHIPS.class and 35 using SQL as follows: select * into #insert~temp from 
SHIPS CLASSJength— **20" may be labelled inaccu- rel_name where 1=2, where rel.name is the relation 
rately. An update of this type that will leave the data- name of the insert request. The data is then inserted into 
base in an inconsistent state will be aborted.* the temporary table with the following insert request: 
M- d "^t^ , &^tt i SUCh " consis,enoies - ^ insert #insert_temp values ("SSBN730", "Georgia" 

3. If a user requests an update at a login secunty level 40 vmo , reo*4 , uuy j. 

that is higher than the level determined by the Update Next > a statement is built from the CON- 
Processorthe SSO will examine the request and, if STRAINT.condition data for CONSTRAINT. - 
acceptable, will allow the update to be executed at the c_id«"3 M and this temporary table. The select state- 
user level. The Update Processor thereby allows for the ment ^ 

overclassification of data. 45 sekct * from #insert_temp where name = "Georgia". 

4. The Update Processor operates with the more since th « se iec * statement successfully retrieves the 
relaxed security policy provided by SYBASE'S relax- one tuple in the temporary table, this constraint applies, 
ation property option. Operating with this option allevi- ami the insert level is determined to be 12, which the 
ates the need for the Update Processor to delete the CONSTRAINT.result -level for this constraint. Al- 
original lower-level tuple when updating a tuple to a 50 though the other constraints may apply, the CON- 
higher level since polyinstantiation is not supported STRAINT.result —level for these constraints is less than 
with updates. 50 & * s not necessary to examine them. 

3.2.3 ALGORITHM FOR ASSIGNING SECU- Update Request 

RITY LEVELS TO DATA T^ 6 algorithm used by the Update Processor to de- 
Insert Request 55 termine the security level of data being updated is as 
The algorithm used by the Update Processor to de- follows. The request is parsed to retrieve the relation 
termine the security level of data being inserted is as name. The Update Processor then searches the CON- 
follows. Once an insert request is received, the request STRAINT table, as it does for an insert request, for all 
is parsed to retrieve the relation name. The Update constraints where CONSTRAINT.result-rel_name 
Processor then searches the CONSTRAINT table for 60 equals the relation name in the request, where the CON- 
all constraints where CONSTRAINT. result_rel— STRAINT.C- level is less than or equal to the user login 
name equals the relation name in the request, where the security level and where the CONSTRAlNT.result- 
CONSTRAINT.c_level is less than or equal to the user -level is greater than the user login security level. The 
login security level, and where the CONSTRAINT- application constraints are then ordered in descending 
.result-level is greater than the user login security 65 order by CONSTRAINT.result -level. Following the 
level. The applicable constraints are then ordered in retrieval from the CONSTRAINT table, a temporary 
descending order by CONSTRAINT.result -level. The table is created with tuples from the relation in the 
constraints are ordered as such to alleviate the need to update request that satisfy the where clause in the up- 
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date request. This temporary table is then utilized as it utilized to provide a common interface for the Update Processor Proto- 

was for an insert request, i.e. as a mechanism to check the Q u ei y ^essor Prototype, 

if the constraints selected from the CONSTRAINT TT ^ ?% e detaiIed P™?^™** * e design ° f the 

table apply. Select statements are built using the tempo- U P date Fro ~ 18 m /!°'^ TlaB flgUre 

rarytableastherelationandtheconditionfromtheto 5 ^ vemew ** com P ^se J ^^ U P d t te 

applicable constraint as the where clause. If the select P rocesSOT - JJe Update Processor is modularized by 

statement successfully retrieves any rows from the ten> ™ d b * the security level at which the function 

porary table, then the constraint applies. The CON- 18 r *?^ *° °P erate ' B ?° h module 15 m P leme nted as 

STRAINT.result_level for this constraint is the level at 9X1 ^ T ^ Process, and process communication is via 

which the Update Processor will request the update to io sockets * underlying TCB must provide a reliable 

the Secure SQL Server. interprocess communication (IPC) mechanism for com- 

As with an insert request, if the select statement does munication. A description of the functions of these 

not retrieve any rows in the temporary table, the tempo- modules is provided below. With is description is a 

rary table is deleted, and the algorithm repeats for the discussion on the security level at which these processes 

next applicable constraint. If the algorithm completes 15 11111 whether they are trusted or untrusted pro- 

and no constraints apply, the update level is determined cesses. 

t be the user login security level T^ 6 update Processor employs a process structure 
The following example illustrates the algorithm for similar to the Query processor to allow for integration 
update requests. Consider the update request by a user witi* the Query processor Prototype, 
at login security level 6 on the SHIPS table which con- 2 Q Process p * provides a similar user interface to pro- 
tains the tuple ("SSBN 728", "Lafayette", "Lafayette", cess PI of the Query processor Prototype, Additionally, 
"Jun 83", "009*') and operates with constraints as de- the functionality of process P2 could be integrated with 
fined in Table 1: the functionality of process P2 of the Query processor 
update SHIPS set name- "Florida" where name= 4 - Prototype. However, processes P3 and P4 of the Up- 
'Lafayette". 25 date Processor must not be confused with processes P3 
Three constraints are retrieved from the CON- and P4 of the Query processor Prototype. Processes P3 
STRAINT table in the order CONSTRAINT.- and P4 are specific t the Update Processor. Should the 
c_id="3", CONSTRAINT.c-id="2'\ CON- Update Processor Prototype be integrated with the 
STRAINT.c-id — "1", Query processor Prototype, processes P3 and P4 of the 
A select statement is then bufit that selects into a tempo- 30 Update Processor must remain unique processes, 
rary table the tuples that satisfy the condition "where Process PI: User Interface Manager 
name="Lafayette", which is the where clause of the The User Interface Manager process, process PI, 
update request The select statement is: "select * into provides a user interface to the Update Processor proto- 
#update-temp from SHIPS where name= "Lafayette" type. At start-up, PI prompts the user for a password 
and see_Iabel =convert(binary,user_see_label), 35 and security level The level specified by the user is the 
where the test for the security label ensures that only level at which this process runs. Next, PI prompts the 
tuples less than or equal to the user security label are user for a database request and remains in idle until it 
selected since the process that performs this operation receives a request Upon receiving a request, PI logs 
runs at system high. Once the temporary table is built, ^to the Secure SQL Server using the user's userid, 
the update request is modified to update the temporary m password, and clearance. The request is then sent to the 
table. Then, the select statement is built using the where server for a syntax checki ^ returns a mesS age 
clause of the first applicable constraint as follows: indicating the result of the syntax check. If successful, 
"select * from fupdate- temp where name^ "Georgia" communication is established with process P2, the Up- 

*.„.., . date Processor Controller, and the request, along with 

Since tins select statement does not retrieve any rows 45 ±Q login packet that contains the useridf login security 

from the temporary table, the temporary table is de- Ievel> ^ password of thc user , is route d to P2. PI then 

leted, and the algonthm repeats for the next constraint rem ains waiting for a response which wfll indicate the 

SHIPS.name^ Florida' === Le- success or Mure of the transaction from P2. Once a 

vel(SHire)=ll, will apply, and the update level will nse h received from p2 pi ^ ^ j ^ re _ 

be n.T^efoUowmg subsec^ 5Q sponse to the user, and PI will again prompt the user for 

% u <= mothQT rec * uest If the user chooses ^ enter a request at 

3 2.4 MODULES OF THE UPDATE PROCES- a login security ^ prQCess pi tQ 

a u* 1. 1 1 * c . ■? t t a tj be restarred, at which point the user will again be 

A high-level architecture for die Update Processor ted for a rd ^ 

prototype is provided in FIG. 9. Abnef descnphon for „ The User Interface Manager operates as the front-end 

datailow within the prototype is as follows: the User to the Update Processor and doe^ not perform security- 

Interface* accepts a users mput and sends the mput to i _ r» T ^ • . . , „ , - / 

the Secure SQL Server for a syntax check. If the syntax °P er ^ n ?-. rt from * e 

is correct, the user kterfaJroutes the input t^the JfT2S2 SST h f k " 

Update Processor. The Update Processor Accepts the a j^E^EST 

^dacc^^^^^ec^M^i^ Process P2: UpdateProcessor Controller 

mput usmg the security constraints as a guideline, and tt™*„*- r> ™„ ^ * n „ 

establishes a connection to the Secure SQL Server at ^Bl^tf;^ 

thedetenninedsecuritylevelforexecutionofthetrans- ^f^f 
action The Uodate Processor then sends a messaee (P rocess P3 ). *e Level Upgrader (process P4), and the 
action, ine urKiate rrrcessormen sends a message 65 Secure S ql Server in determining the level of the up- 
back to the User Interface indicating the completion , , > „- „ * * £ 
status of the transaction date and in performing the update. Upon start-up, P2 
Minor changes were made to the user interface of the query processor idl ^' Waiting for a request for PI. Upon receiving the 
for use by the Update Processor Prototype. This User Interface was login packet and the request, P2 logs into the Secure 
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SQL Server utilizing the userid, password, and clear- serted/updated at the determined level. The Update 
ance in the login packet. Thus, P2 runs at the user level. Processor can ensure that data is accurately labelled 
The Update Processor controller then examines the when a user enters data while logged in at the wrong 
request to determine if it is a select, an insert, an update, level, when data is imported from systems of different 
or a delete. If the request is an insert or an update, some 5 modes of operation, such as a system high, or when the 
preliminary processing is performed on the request, and security level of data in the database is affected by data 
the request along with the login packet is sent to P3. P2 being entered into the database, 
remains idle, waiting for a response which will contain As discussed previously, in addition to operating as a 
the insert/update level from P3. If the level determined standalone tool, the Update Processor has been de- 
by P3 is greater than the user level, P2 invokes P4 to 10 signed to operate with the Query processor Prototype, 
perform the insert/update level from P3. If the level As such, some of the burden placed on the Query pro- 
determined by P3 is greater than the user level, P2 cessor can be alleviated since the simple and content- 
invokes P4 to perform the insert/update. P2 then idles, based constraints can be addressed by the Update Pro- 
waiting for a successor or failure response from P4. If cessor Operating inn an environment where users both 
the level determined by P3 is the user level, then P2 15 que ry and update the database, however, allows for the 
sends the request to the Secure SQL Server to perform occurrence of potential signaling channels. As an exam- 
ine inserffupdate. The Secure SQL Server returns a pie, in some cases the user cannot retrieve the data he 
completion status message to P2 t indicating whe^er the has entered. Since the security levels of the security 
transaction completed or failed. P2 then sends this com- constraints that determined the security level of the 
pletion status message to PI and waits for the next re- 20 mpu t is not at a level higher than the user level, i.e., the 
quest from PI. value of CONSTRAINT.c level for constraints used 
The Update Processor Controller provides assurance during update processing is the user level, we do not 
that the connection to the Secure SQL Server is estab- rcgar d this as a signaling channel The data in the CON- 
lished at the correct level, that the user's request is not STRAINT table is labelled at system high to allow an 
modified, and that the level determined by P3 is either 25 sso t0 mamtam t h e table,* but the CONSTRAINT.c 
the level at which the update is performed or the level level value reflects the trae levd of the constramt . 
sent to P4. As such, the Update Processor Controller is Therefore, if the constraint level, which is the value of 
a trusted process. CONSTRAINT.c level, is at or below the user level, 
Process P3: Update Constraint Gatherer we assume it is not the action of a high-level user or 
The Update Constraint Gatherer is responsible for 30 subject ^ is interfering with the result, 
determining the security level of the data utilizing the SYBASE requires an sso to be logged in at system high to have access 
applicable security constraints. Since P3 must have to SSO functions. 

access to the constraints that are stored in the CON- Another significant consideration with the Update 

STRAINT table, which is defined at system high, P3 Processor operating with the Query processor Proto- 

runs a system high- At start-up, P3 waits for a request 35 tv P e * s tne content of error messages. The content of 

from P2. Upon receiving a request, P3 determines the of tne Secure SQL Server's error messages, cou- 

security level of the insert or update, utilizing the aigo- P Ied tne ability to query the database, may enable 

rithms described above. P3 then sends this level to P2 a user to infer something about the security level of his 

and idles, waiting for another request insert/update. As an example, if it is determined that an 

Since the Update Constraint Gatherer determines the 40 insert request should be processed at a higher level, and 

level at which the insert/update will be performed, if a tuple with the same primary key already exists at the 

assurance must be provided that the applicable con- higher level, then a message that indicates that a dupli- 

straints are used and that the level determined by this ca*e key row already exists is sent by the Secure SQL 

process is accurate. This process, therefore, is a trusted Server to the Level Upgrader. If this message were 

process. 45 routed to the User Interface Manager, the user could 

Process P4: Level Upgrader infer that the data he entered exists at a higher level 

The Level Upgrader is the process that issues the Furthermore, he could infer that the data exists at a 

request to the Secure SQL Server at the level deter- higher level either because it was input by a user at a 

mined by P3 when the insert/update level determined higher level or because a security constraint exists that 

by P3 is greater than the user level. (Note: P2 runs at the 50 determined the insert level to be so. Through experi- 

user level.) At start-up, P4 wait for a request from P2. menting with additional inserts, the user could deter- 

Upon receiving the level from P2, P4 logs into the mine the existence of this security constraint. Our solu- 

Secure SQL Server at this level and sends the request to tion is to have the Level Upgrader interpret this error 

the server. The response from the server is examined, message to be a request failed message. A request failed 

and the completion status message is sent to P2. P4 then 55 message is then sent to the Update Processor controller, 

idles, awaiting another request from P2. who in turn sends it to the User Interface Manager that 

The Level Upgrader provides assurance that the displays it to the user. The user, therefore, is only aware 

level at which it requests the Secure SQL Server to that the request failed. To further resolve this confusion 

perform the insert/update is the level received from P2. for the user, we recommend that transactions of this 

P4 is therefore a trusted process. 60 type be audited and that the SSO be alerted to provide 

3.2.5 GENERAL DISCUSSION an explanation to the user if needed. 

In this section we provide a general discussion on the Performance is an additional concern with the Up- 

prototype implemented. Approximately 2500 lines of C date Processor. The response time of the Query proces- 

code was implemented for the Update Processor. As sor may improve with the use of the Update Processor, 

mentioned earlier, the Update Processor has the ability 65 but the response time for updates will be affected. This, 

to analyze a user's insert/update request, determine the however, is acceptable for an application whose per- 

security level of the data to be inserted/updated utiliz- centage of retrievals exceeds that of updates. Addition- 

ing security constraints, and ensure that the data is in- ally, should this functionality be incorporated into MLS 



DBMS, the effect on performance may not be signifi- 
cant since this functionality could exist as part of the 
DBMS kernel rather than as a user application as it 
currently exists. Regardless, we project that since the 
performance of updates, in general, is not quite as criti- 
cal as the performance of retrievals, the benefits from 
implementing this security functionality should out- 
weigh the projected minimal loss in performance. 

In general, the Update Processor provides functional- 
ity which is desirable in a multilevel operating environ- 
ment. The nature of the tool allows for it to operate as 
a standalone tool or in conjunction with a Query pro- 
cessor Additionally, this functionality could easily be 
adapted to operate with an existing MLD DBMS to 
enhance its security features. 

3.3 TEST SCENARIOS 

This section illustrates the functionality of the Update 
Processor. Included in this section is a description of 
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and content-based security constraints used by the Up- 
date Processor. The following four constraints are ac- 
tive for these tests. 

1. SHIPS.class^'^Ohio'^Level (SHIPS)=10; 
5 1 SHIPS.name ="Florida"-*Levei (SfflPS)= 1 1; 

3. SfflPS.name="Georgia'WLevel (SHIPS)=12; 

4. -^Level (SHIPS - CLASS) = 8; 
3.3.2 TEST SCENARIOS 

TEST SCENARIO I: insert SHIPS values ("SSBN 
10 728", "Lafayette", "Lafayette", "Jun 83", "009") 

This scenario exemplifies an insert transaction that is 
not affected by the security constraints, since the value 
for SHIPS.class is not "ohio", and SHEPS.name is not 
"Florida" or "Georgia". The following response to the 
15 SQL select statement demonstrates the results of this 
transaction. 

1> select seclabel, *from SHIPS 

2>go 



sec_Jabel 


number 


name 


class 


date 


assignment 


0x060000000000000000000000 


SSBN 28 


Lafayette 


Lafayette 


Jon S3 


009 



our test database and our test scenarios. With each test 
scenario is a statement of input, the significance of the 
test, and the results of the test* Each scenario uses our 
test database, the ships database, and each scenario is to 
be executed by a user at login security level 6. 
3.3.1 TEST DATABASE 

Our test database is the ships database. The ships 
database and all the relations in this database have been 
defined at level 1. The test database will initially be 
empty and will not be reinitialized with each scenario so 
the reader can see the results of utilizing the Update 
Processor as each transaction completes, as this is how 
it would operationally be used. 

All of our example transactions are against the 
SHIPS and SHIPS CLASS relations which have been 
defined below. 



The results indicate that the tuple was not affected by 
25 the security constraints and was inserted at the user 
level which is level 6. 
TEST SCENARIO 2: 

Insert SHIPS values("SSBN 729", "Florida",*- 
< Lafayette","Jun 83","09)) 

This scenario exemplifies an insert transaction that is 
affected by security constraint 2. The Update Processor 
actually retrieves the three security constraints associ- 
ated with the SHIPS relation and examines them in 
descending order by constraint security level. As a 
result, security constraint 3 is examined, followed by 
security constraint 2. The following retrieval demon- 
strates the results of this transaction. 
1> select sec_abel, *from SHIPS 
2>go 



sec label 


number 


name 


class 


date 


assignment 


0x060000000000000000000000 


SSBN 728 


Lafayette 


Lafayette 


Jun 83 


009 



create table SHIPS 
(number varchar(10), 
name varchar(22), 
class varchar(22), 
date varchar{8), 
assignment varchar(lO)) 
unique index: number 
create table SHIPS CLASS 
(classification varchar(50), 
length varchar(50), 



This result does not indicate that the tuple was in- 
serted. However, the Update Processor returned a suc- 
cessful response to the user. The user, therefore, will 
assume that either his tuple has since been deleted or 
that it was inserted at a level higher than his login secu- 
50 rity level. The following response, submitted by a user 
at login level 16, demonstrates the results of this trans- 
action. 

1> select sec-label, *from SHIPS 
2>go 



sec_label number name class date ass ignment 

QxC)6000(XXXXXXXXXX)00000000 SSBN 728 Lafayette Lafayette Jun 83 009 
Qjt(K)0(XX)000(X)0(XXX)00000000 SSBN 729 Florida Lafayette Jun 83 009 



disp varchar(7), 

speed varchar(4), This response indicates that the tuple was inserted at 

missile varchar(15), level 11 since security constraint 2 was satisfied, 

torpedo varchar(15), TEST SCENARIO 3: 

gun varchar(15)) 65 Insert SHIPS values("SSBN 730","Georgia","Ohi- 

unique index: classification o","Feb 84","009") 

As mentioned in section 3.3, each user database must This scenario exemplifies an insert transaction that is 

contain a CONSTRAINTS table to store the simple affected by more than one security constraint. The Up- 
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date Processor retrieves the three security constraints tion failed. It would then be the responsibility of the 

associated with the SHIPS relation in descending order SSO to resolve this situation with the user, 

by constraint security level. However, the condition TEST SCENARIO 6: 

from security constraint 3 is satisfied, so that the insert Insert SHIPS values("SSBN 728'7'Lafayette'Y- 

level is determined to be 12, ad the remaining con- 5 'Lafayette" "Jun 83'Y'009") 

straints are not examined. The following response, sub- This scenario exemplifies an insert transaction that 

mitted by a user at login level 16, demonstrates the will result in a tuple being inserted at the user level, 

results of this transaction. A user logged in at level 6 level 6, followed by an update operation that will be 

would retrieve only the tuple with number ="SSBN aborted, since it will result in a duplicate key row. The 

728". 10 following tuples reside in the database after the above 

1 > select sec-label, *from SHIPS transaction is executed. The following response, submit- 

2>go ted by a user at login level 16, demonstrates the results 



secJabel number name class date assignment 

0x060000000000000000000000 SSBN 728 Lafayette Lafayette Jun 83 009 

QxObOOOOXlOOOOOOOOOOOOOOOO SSBN 729 Florida Lafayette Jun 83 009 

OxOcOOOOOOOOOOOOOOOQOOOOOO SSBN73Q Georgia Ohio Jun 85 006 



TEST SCENARIO 4: of this transaction. 

Update SHIPS set name = "Florida" where name='- 1 > select sec-label, *from SHIPS 
'Lafayette'* 2>go 



sec— label 


number 


name 


class 


date 


assignment 


OxObO00OO00(»OO00OO0000O0O 


SSBN 


729 


Florida 


Lafayette 


Jun 83 


009 


0xOc(»O0O00000O00O00000OO0 


SSBN 


730 


Georgia 


Ohio 


Jun 85 


006 


OxObOOOOO(X)00(XXXXX)0000000 


SSBN 


728 


Florida 


Lafayette 


Jun 83 


009 


0x060000000000000000000000 


SSBN 


728 


Lafayette 


Lafayette 


Jun 83 


009 



This scenario exemplifies an update transaction that is 
affected by a security constraint. Without utilizing the 
Update Processor, this transaction would result in the 
modification to the name field in the tuple with num- 
ber="SSBN 728'\ This tuple exists at level 6 which is 35 
the user's login security level. Utilizing the Update 
Processor, however, the name field will be modified as 
well as the classification level. Since the name is being 
set for "Florida" security constraint 2 will determine 
the update security level to be 1 L The tuple will then be 40 
inserted at level II, and the original tuple will be de- 
leted, as we are running with the relaxation property 
on* The following response, submitted by a user at login 
level 16, demonstrates the results of this transaction. A 
user logged in at level 6 would not have access to the 45 
data currently in the SHIPS table. 

1> select sec -label, *from SHIPS 

2>go 



Following, the update transaction: 

Update SHIPS set name = "Florida" where name= ( - 
'Lafayette" is executed. The Update Processor will 
determine the update level to be 1 1, since security con- 
straint 2 is satisfied, and will attempt to execute this 
update at level 1 1. The server will abort this request and 
return a message indicating that a duplicate key row 
already exists. The Update Processor will then send a 
message to the user that the transaction failed. It would 
then be the responsibility of the SSO to resolve this 
situation with the user. 

TEST SCENARIO 7: 

Insert SHIPS-CLASS values("Ohio","Nu- 
cieary ir7 ( 187V2(r,"tri I'V'Mk 68") 

This scenario exemplifies an insert transaction that is 
affected by security constraint 4. Security constraint 4 is 
a simple constraint that specifies that all data in the 
relation SHIPS-CLASS will be at level 8. Therefore, 



sec— .label number name class date assig nment 

OxObOOOOOOOOOOOOOOOOOOOOOO SSBN 729 Florida Lafayette Jun 83 009 

OxOcOOOO(XXX)00000000000000 SSBN 730 Georgia Ohio Jun 85 006 

OxObOOCXMOOCOOOOOOOOOOOOOO SSBN 728 Florida Lafayette Jun 83 009 



TEST SCENARIO 5- t ^ ie ^l**^ ^ rocessc >*" will determine the insert level to 

Insert SHIPS vaiues(«SSBN 729'7<Florida>7- 
'Lafavette" "June 83" resultS of thls transacti °n. 

Latayette , June 83 , 09 ) l>select sec_label, *from SHIPS 

This scenario exemplifies an insert transaction that 60 2^20 
will result in a duplicate key row. When a user logged ° 
in at level 6 queries the database, he will not retrieve the 
duplicate tuple at level 11. When this tuple is inserted, 
the Update Processor will determine that the insert 
level is level 11, will attempt to insert it at level 11, and 65 
will receive a message from the Secure SQL Server that 
a duplicate key row already exists. The Update Proces- 
sor will then send a message to the user that the transac- 



sec— label 




name classification 


length disp 


OxOcOOOOO(XXX)0000000000000 


Ohio nuclear 


17 


(Attributes Continued) 


speed 


missile torpedo 


gun 




187 


20 Tril 


Mk6S 
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4. HANDLING SECURITY CONSTRAINTS at the Secret level. It should be noted that if any of the 

DURING DATABASE DESIGN constraints have conditions attached to them, then han- 

4. 1 OVERVIEW dling them during database design time would be diffi- 

The main focus of this section is a discussion on how cult. For example, consider the following constraint: 

association-based constraints (also called together or 5 "Name and Destination taken together are Secret if 

context-based constraints) could be handled during destination is a Middle-east country". Such a constraint 

database design. We then briefly discuss how simple depends on data values. Therefore, they are best han- 

constraints as well as logical constraints could be han- died during either query and update processing, 

died. The organization of this paper is as follows. In section 

An association-based constraint classifies a collection 10 6.2 we describe an algorithm which determines the 
of attributes taken together at a particular security level. security levels of the attributes given a set of associa- 
What is interesting about the association-based con- tion-based constraints. A tool could be developed based 
straint is that it can generate several relationships be- on this algorithm which the SSO could use to design the 
tween the various attributes. For example, if there is a schema. In section 6.3 we describe how simple con- 
relation SHIPS whose attributes are S#, SNAME, and 15 straints could be handled during database design. Fi- 
CAPTAIN, and if an association-based constraint clas- nally in section 6.4 we discuss how logical constraint 
sifies the SNAME and CAPTAIN taken together at the may be processed. 

Secret level, then one of the pairs (S#, SNAME), (S#, 4.2 HANDLING ASSOCIATION-BASED CON- 
CAPTAIN) should also be classified at the Secret level. STRAINTS 

Otherwise, an Unclassified user can obtain the (S#, 20 In this section we describe an algorithm for handling 
SNAMEO and the $3, CAPTAIN) pairs and infer the association-based constraints. The input to this algo- 
Secret association (SNAME, CAPTAIN). There has rithm is a set of association-based constraints and a set of 
been much discussion in the literature as to the appro- attributes. The output of this algorithm is a set of clus- 
priate place to handle these association-based con- ters for each security level. Each cluster for a security 
straints. Some argue that they should be handled during 25 level L will have a collection of attributes that can be 
database design Lunt, T., May 1989, "Inference and safely classified at the level L. That is, if Al, A2, and A4 
Aggregation, Facts and Fallacies," Proceedings of the are attributes in a cluster C at level Secret, then the 
IEEE Symposium on Security and Privacy, Oakland, attributes Al, A2, and A3 can be classified together 
Calif, while others argue that they should be handled safely at the security level Secret without violating 
during query and update processing Stachour, P., and 30 security. The clusters are formed depending on the 
B. Thuraisingham, June 1990, "Design of LDV— a association-based constraints which are input to the 
Multilevel Secure Relational Database Management program. Once the clusters are formed, then the data- 
System," IEE Transactions on Knowledge and Data base can be defined according to the functional and 
Engineering, Volume 2, No. 2. However, none of the multivalued dependencies that are enforced, 
work reported so far studied the properties of these 35 ALGORITHM HABC (Handling Association-Based 
association-based constraints, nor has it provided any Constraints) Begin 

technique to generate the additional association-based Let C be the set of security constraints and Wl, W2, 

constraints that can be deduced from an initial set of Wm be the set of attributes which are input to the 

association-based constraints. program 

We first describe an algorithm which processes a 40 For each security level L, do the following: 

given set of association-based constraints and outputs Begin 

the schema for the multilevel database. Given a set of Let C[L]be the largest subset of C and A={A1,A2, . 

association-based constraints and an initial schema, the . . An} be the largest subset of {Wl, W2, . . . Wm}, 

algorithm will output clusters of attributes and the secu- such that the elements of subset of C and A are all 

rity level of each cluster. We then prove that the attri- 45 visible at level L. 

butes within a cluster can be stored securely at the Since n is the number of attributes which are visible at 

corresponding level A tool based on this algorithm can level L, clusters CI, C2, . . . Cn will be formed as 

help the systems security officer (SSO) design the multi- follows: 

level database. The algorithm that we have designed Set CI— C2=C3=. . . = Cn — Empty-set. For each 

does not necessarily have to be executed during data- 50 i(l<i<n) do the following: 

base design only. It can also be executed during query Begin 

processing. That is, the query processor can examine Find the first cluster Cj(l<j<n) such that Ai, 

the attributes in the various clusters generated by the together with any of the attributes already in 

algorithm and then determine which information has to Cj, is classified at a level dominated by L by 

be released to the users. For example, if the algorithm 55 the set of constraints C[L], Place Ai in the 

places the attribute Al, A2 in cluster 1 at level L, and cluster Cj. (Note that since we have defined n 

the attributes A3, A4 in cluster 2 at level L, then, after clusters, there will definitely be one such Cj.) 

an attribute in cluster 1 has been released to a user at End (for each i). 

level L, none of the attributes in cluster 2 can be re- Output all the non-empty clusters along with the 

leased to users at level L. 60 security level L. End (for each security level L). 

Since simple constraints can be regarded as a special End (HABL) 

form of association-based constraints, where only one Theorem 1: Algorithm HABL is Sound, 

attribute is classified, we feel that such constraints could Proof of Theorem 1: 

also be handled during database design. Another con- We need to show that for every security level L, the 

straint that could be handled during database design is 65 attributes in a cluster formed at L can safely be stored 

the logical constraint. For example, if attribute A im- together in a file at level L. 

plies an attribute B, and if attribute B is classified at the Let C be a cluster at level L, and let Bl, B2, Br 

Secret level, then attribute A must be classified at least be the attributes in C. Note that before each Bi is placed, 
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it will be first checked to determine whether or not tained, then this schema is given as input to the algo- 

there is an association-based constraint which classifies rithm handling association-based constraints. The as- 

Bi together with any subset of the attributes Bl, B2, . . sociation-based constraints are then applied and the 

. Bi-1 already in C at a level not dominated by L. If so, final schema is obtained. 

Bi would not have been placed in the cluster C. 5 We illustrate the combined algorithm with an exam- 

Since this is true for each Bi (1 <i<r), there is no p i e . Let relation R have attributes Al, A2, A3, A4. Let 

association-based constraint which classifies any subset t h e constraints enforced be the following: 

of Bl, B2, . . . Br taken together at a level not dominated simple constraint: A4 is Secret Association-based 

by L. Therefore, Bl, B2, ... Br can be safely stored in constraint: A2 and A3 together are TopSecret. 

a file at level L. , 10 Applying the algorithm for handling simple, con- 

ThK ? e ?iL Aigon * m HABL 15 ^P 1 ^- straints we obtain the following Al, A2, A3 are Unclas- 

Proof of Theorem 2: sified; Al, A2, A3, and A4 are Secret. 

Weneedtoshow^ifQandCjaretwocIusters^ Next we j the ^ rithm for hmd& 

a leve L, there art , subsete A and ^B, respectively , of Q ^ fffial h . £ md ^ 

rffleilSlL atAandBcamotbest0redt °« etherm 15 are Unclassified; Al, A2, and A4 are Secret; Al, A2, 

* ^X^^^^a^^C^^ ^JSjKSSS^SScAL CONSTRAINTS 

enumeration of the clusters formed at level L. T . , _ t vwi^ wwuwum o 

Suppose, on the contrary, that A and B do not exist , Lo « lcal J™*™* ™ les c T f used to de ~ 
Consider an element X of cluster Cj. Since Ci is before 20 ™* fr ° wn ***** data ; If f fc«nty con- 
Cj in the enumeration, before placing X in Cj, it would stoint classifies the new data at a level that is higher 
have been first checked to determine whether or not X thar * of existm & data > then the existm S data ™>* be 
can be placed in Ci. It would have been found that there re-classified. Logical constraints could be straightfor- 
was an association-based constraint which classifies X ward such as Ai= = > Aj or they could be more com- 
together with the attributes already in subset P of Ci at 2 $ P lex ^ch as Al & A2 & A3 & . . . An= = >Am. If Aj 
a level not dominated by L. That is, the subset P and is classified at the Secret level then A must be classified 
{X} of Ci and Cj respectively cannot be stored in a file at least at Secret level If Am is classified at the 
at level L. That is, we have found two sets A and B, Secret level, then at least one of Al, A2, ... An must be 
which are subsets of Ci and Cj, respectively, which classified at least the Secret level, 
cannot be stored in a file at level L. This is a contradic- 30 In section 4 we showed how the logical constraints 
tion to our assumption. may be handled during query processing. For example 
We now trace the algorithm with a simple example. consider the constraint AiAn— = > Aj. If Aj is classi- 
Let the attributes be Al, A2, A3, A4, AS. Let the fied at the Secret level, and an Unclassified user re- 
constraints be the following: quests for Ai values, the query processor will ensure 
CONl:Al-A2=Secret* 35 that the Ai values are not released. That is, although Ai 
**£s^f^ knMM and A2 taker, together are classified my ^ explicitly assigned the Unclassified level, since 
CON2:Al-AS= Secret ^e ^°S^ C2d constraint is treated as a derivation rule, it 
CON3:Al-A4A5=Secret does not cause m y ^consistency. That is, during query 
CON4:A2» A4= Secret processing, the security level of Ai will be computed to 
CON5:A3-A4=Secret 40 he Secret. 

Note that some of the constraints are redundant. For For logical constraints which do not have any condi- 

exampie, CON2 implies CON3. In this paper we are not tions attached, it appears that they could be handled 

concerned with the redundancy of the constraints. during database design. That is during design time the 

Since the maximum classification level assigned is logical constraints are examined, and the security levels 

Secret, aH the attributes can be stored in a file at the 45 of the attributes specified in the premise of a constraint 

level Secret or Higher. At the Unclassified level, the could be computed For example, if Aj is classified at 

following clusters are created: * e Secret level then it must be ensured during design 

C1={A1, A3} time & at ^ classified at least at the Secret level also. 

C2={A2, A5} The following algorithm will ensure that the security 

C3={A4} 50 levels are computed correctly. 

It should be noted that, although the algorithm guar- L Do the following each logical constraint (Note 
antees that the constraints are processed securely, it that we have assumed that the constraints are ex- 
does not provide any guarantee that the attributes are pressed as horn clauses. That is, there is only one 
not overclassified. More research needs to be done in atom in the head of a clause), 
order to develop an algorithm which does not overclas- 55 2. Check whether there are any simple constraints 
sify an attribute more than is necessary. which classify the attribute appearing in the head 

4.3 A NOTE ON SIMPLE CONSTRAINTS of the logical constraint at any level. If not, ignore 

Since simple constraints classify individual attributes the constraint, 

at a certain security level, they could also be handled 3. If so, find the highest security level L that is speci- 

during database design. Note that when an attribute A 60 fied for this attribute. 

in relation R is classified at level L, then all elements 4. Check whether any of the attributes appearing as 

which belong to A is also classified at level L. There- premises of the logical constraint are classified at 

fore, we can store A itself at level L. least at level L. If so, ignore the constraint. 

The algorithm which handles simple constraint is 5. If not, classify one of the attributes (say, the first 

straightforward. Each attribute that is classified by a 65 one) at the level L. 

simple constraint is stored at level specified in the con- The algorithm given above does not ensure that the 
straint. Once the algorithm for processing simple con- attributes are not overclassified. In order to avoid the 

straints is applied and the corresponding schema is ob- overclassification problem, modification must be made 
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to step 5. That is, once an attribute is assigned a security domain. Also, SHIPS.Assignment is a foreign key. Fol- 

level, it is possible for the level to be re-assigned based lowing are the Security Constraints: GROUPSXoca- 

on other logical constraints that are handled. Our cur- tion and GROUPS.Mission taken together are Secret; 

rent research includes investigating techniques for sue- SHIPS.Name is Secret if SHIPS.Class=Los Angeles, 

cessfully assigning security levels to the attributes and 5 Time Tl: Database Design Tool Produces the fol- 

at the same time avoiding overclassification. lowing output: The database consists of three relations 

When logical, simple, and association-based con- SHIPS, GROUPS, and GRP-MISS. The relations 

straints are combined, then the first step would be to SHIPS and GROUPS are Unclassified. The relation 

handle the simple constraints. The next step would be to GRP-MISS is Secret. The attributes of SHIPS are 

apply the algorithm given above for the logical con- 10 Number, Name, Class, Date, and Assignment. Its pri- 

straints. Finally, the algorithm given in section 6.2 is mary key is Number. The attributes of GROUPS are 

applied for the association-based constraints. Number, Location and Siop. Its primary key is Number. 

5. TOWARDS AN INTEGRATED APPROACH The attributes of GRP-MISS are Number and Mission. 

Consider the integrated architecture illustrated in SHIPS. Assignment and GROUPS. Number take val- 

FIG. 5. This architecture provides an integrated solu- 15 ues from the same domain. GROUPS. Number and 

tion to constraint processing in a multilevel environ- GRP-MISS. Number take values from the same do- 

ment In this section we describe this integrated solution main. Also, SHIPS. Assignment is a foreign key. Fol- 

using an operational example. The example is explained lowing Security Constraint is enforced: SHIPS* Name 

in natural language. is Secret if SHIPS. Class = Los Angeles. 

Time to: Constraint Generator produces the follow- 20 Time T2: The Update Processor populates the data- 

ing output: The database consisting of the relations base as follows. Note that we have included security 

SHIPS and GROUPS. Both relations are Unclassified. level as a field in the relations; ^Unclassified, 10-Se- 

The attributes of SHIPS are Number, Name, Class, cret. 





Relation SHIPS 








Number 


Name Class 


Date Asi 


agnment 


Level 


CVN68 


Nimitz Nimitz 


May 75 


003 


I 


CV67 


John F. Kennedy John F. Kennedy 


Sep 68 


001 


1 


BB 61 


Iowa Iowa 


Feb 43 


003 


1 


CG47 


Ticonderoga Ticonderoga 


Jan S3 


005 


1 


DD963 


Spruance Spruance 


Sep 75 


006 


1 


AGF3 


La Salle Converted Raleigh 


Feb 64 


003 


1 


WHEC715 


Hamilton Hamilton 


Feb 67 


003 


1 


FFG7 


Oliver Hazard Perry Oliver Hazard Perry 


Dec 77 


001 




FF1052 


Knox Knrnr 


Apr 69 


001 




LSD 36 


Anchorage Anchorage 


Mar 69 


009 




LHA 1 


Tarawa Tarawa 


May 76 


003 




MCM 1 


Avenger Avenger 


Sep 87 


003 




AOR 1 


Whichita Whichita 


Jim 69 


003 




AFS 1 


Mars Mars 


Dec 63 


001 




AE21 


Suribachi Suribachi 


Nov 56 


009 




AE23 


Nitro Nitro 


May 59 


005 




AO 177 


New Cimarron New Cimarron 


Jan 81 


001 




SSN706 


Albuquerque Los Angeles 


May 83 


006 


10 


CVN65 


Enterprise Enterprise 


Nov 61 


009 




MSO 427 


Constant Aggressive 


Sep 54 


001 






Relation Groups 








Number 


Location 


Siop 






001 


North Atlantic 


001 






002 


South Atlantic 


002 






003 


Mediterranean 


006 






004 


Philippines 


005 






005 


Persian Gulf 


004 






006 


Indian Ocean 


004 






007 


North Sea 


003 






008 


North Atlantic 


003 






009 


North Pacific 


001 








Relation GRP-MISS 








Number 


Mission 


Level 




00! 


naval exercises 


10 






002 


falldands patrol 


10 






003 


iraq crisis 


10 






004 


stabilize government 


10 






005 


iraq crisis 


10 






006 


naval exercises 


10 






007 


soviet reconnaissance 


10 






OOS 


oceanographic research 


10 






009 


oceanographic research 


10 







Data, and Assignment* Its primary key is Number. The 65 
attributes of GROUPS are Number, Location, Mission, 

and Siop. Its primary key is Number. SHIPS. Assign- Time T3: Unclassified user poses queries to select all 
ment and GROUPS.Number take values from the same from SHIPS and GROUPS. 



Result: He will get all of the tuples in GROUPS and 
all of the tuples in SHIPS except the one at the level 10 
(Le. the tuple whose ship class is Los Angeles). 

Time T4: Real world changes. The constraint which 
originally classifies locations and missions together at 5 
the Secret level is removed. Note that such a constraint 
is not in the constraint database. Two new constraints 
are introduced. One classifies tuples in GROUPS where 
the location is Persian Gulf at the Secret level. The 
other classifies ship names and assignments taken to- 10 
gether at the Secret level. 

The Constraint database is updated to include the two 
new constraints. We assume that no changes are made 
to the database or to the schema. The constraint updater 
informs the query processor of the inconsistency. 15 

Time T5: An Unclassified user poses the same query; 
that is to retrieve all tuples from SHIPS and GROUPS. 

Result: He will get all values for the attributes 
SHIPS. Number, SHIP. Name, SHIP. Class, and 
SHIPS. Date provided SHIPS, class does not have the 20 
value Los Angeles. He will not get any values for 
SHIPS. Assignment He will get all tuples from 
GROUPS where the location is not Persian Gulf, Al- 
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though GRP-MISS need not be classified at the Secret 
level, the user will still not get any data from GRP- 
MISS as the relation has not yet been downgraded. 

Time T6: The database is re-designed and the data- 
base data is re-classified. 

There are three relations SHIPS, SH-ASSIG, and 
GROUPS. SHIPS and GROUPS are Unclassified, SH- 
ASSIG is Secret. The attributes of SHIPS are Number, 
Name, Class, and Date. Its primary key is Number. 
SH-ASSIG has attributes Number and Assignment 
SH-ASSIG Number SHIPS. Number i s primary key. 
SHIPS. Number and SH-ASSIG. Number take values 
from the same domain. The attributes of GROUPS are 
Number, Location, Mission, and Siop. Its primary key is 
Number. SH-ASSIG. Assignment and GROUPS. 
Number take values from the same domain. Also, SH- 
ASSIG. Assignment is a foreign key. Following are the 
Security Constraints: SHIPS. Name is Secret if SHIPS. 
Class=Los Angeles, Each of GROUPS. Number, 
GROUPS. Location, GROUPS. Mission, and 
GROUPS. Siop is Secret if GROUPS. Location= Per- 
sian Gulf. The database is populated as shown below. 



Relation Groups 

Number Location Mission Siop Level 



001 


North Atlantic 


naval exercises 


001 


1 


002 


South Atlantic 


faiklands patrol 


002 


1 


003 


Mediterranean 


iraq crisis 


006 


1 


004 


Philippines 


stabilize government 


005 


1 


005 


Persian Gulf 


iraq crisis 


004 


10 


006 


Indian Ocean 


naval exercises 


004 


1 


007 


North Sea 


soviet reconnaissance 


003 


1 


008 


North Atlantic 


oceanographic 
research 


003 


1 




North Pacific 


oceanographic 
research 


001 


1 






Relation SH-ASSIG 






Number 




Assignment 


Level 




CVN 68 




003 


10 




CV 67 




001 


10 




BB 61 






10 




CG47 




005 


10 




DD 963 




006 


10 




AGF3 




003 


10 




WHEC7I5 




003 


10 




FFG7 




001 


10 




FF1052 




001 


10 




LSD 36 




009 


10 




LHA 1 




003 


10 




MCM1 




003 


10 




AOR 1 




003 


10 




AFS 1 




oot 


10 




AE 1 




009 


10 




AE23 




003 


10 




AO 177 




001 


10 




SSN706 




006 


10 




CVN 65 




009 


10 




MSO 427 




001 


10 








Relation SHIPS 






Number 


Name 


Class 


Date 


Level 


CVN 68 


Nimitz 


Nimitz 


May 75 




CV67 


John F. Kennedy John F. Kennedy 


Sep 68 




B5 61 


Iowa 


Iowa 


Feb 43 




CG47 


Ticonderoga 


Ticonderoga 


Jan 83 




DD 963 


Spruance 


Spruance 


Sep 75 




AGF3 


La Salle 


Converted Raleigh 


Feb 64 




WHEC 715 


Hamilton 


Hamilton 


Feb 67 




FFG7 


Oliver Hazard Perry Oliver Hazard Perry 


Dec 77 




FF1052 


Knox 


Knox 


Apr 69 




LSD 36 


Anchorage 


Anchorage 


Mar 69 




LHA 1 


Tarawa 


Tarawa 


May 76 




MCM1 


Avenger 


Avenger 


Sep 87 




AOR 1 


Whichita 


Whichita 


Jun69 




AFS 1 


Mars 


Mars 


Dec 63 
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AE2I 


Suribachi 


Suribachi 


Nov 56 


1 


AE23 


Nitro 


Nitro 


May 59 


1 


AO 177 


New Cimarron 


New Cimarron 


Jan 81 


1 


SSN 706 


Albuquerque 


Albuquerque 


May 83 


10 


CVN 65 


Enterprise 


Enterprise 


Nov 61 


1 


MS0 427 


Constant 


Aggressive 


Sep 54 


1 



Time 17: Unclassified user poses queries to select all 
from SHIPS and GROUPS. 

He will get all tuples from SHIPS except the one with 
class— Los Angeles* he will get all tuples from 
GROUPS except the one with location^ Persian Gulf. 
He will not get any tuples from SH-ASSIG. 

We claim: 

1. Apparatus for an integrated architecture for an 
extended multilevel secure database management sys- 
tem which processes security constraints to control 
unauthorized inference through logical deduction upon 
queries by users and implemented when the database is 
queried through the database management system, 
when the database is updated through the database 
management system, and when the database is designed, 
the integrated architecture comprising: 
a knowledge base for storing the security constraints, 
application data and information on responses re- 
leased from the multilevel secure database manage- 
ment system; 

a multilevel database which contains data classified at 
different security levels; 

a multilevel metadatabase to store schemes describing 
data in the multilevel database, the schemas classi- 
fied at said different security levels; 

the multilevel secure database management system 
utilized to access the multilevel database for 
queries and updates and to access the multilevel 
metadatabase for querying and updating the sche- 
mas by users cleared to said different security lev- 
els; 

a query processor augmenting the multilevel secure 
database management system and accessing the 
knowledge base to examine the security con- 
straints, application data and responses already 
released and to modify queries to prevent unautho- 
rized inferences and to output a modified query for 
evaluation by the multilevel secure database man- 
agement system, the multilevel secure database 
management system providing an output to the 
query processor which examines the security con- 
straints, the application data, and responses already 
released, and modifies the responses to prevent 
unauthorized inferences, 

an update processor augmenting the multilevel secure 
database management system for examining some 
of said security constraints and to assign security 
levels to the data; 

the update processor complementing functions of the 
query processor such that if some of the constraints 
are processed during updates and the data is as- 
signed appropriate security levels, said constraints 
need not be processed by the query processor, for 
performance enhancement the said update proces- 
sor also being used as an off-line tool to determine 
the security levels of the data; 

a multilevel database design tool which examines 
some of the security constraints and assigns secu- 
rity levels to the schemas, the schemas then being 
input to the multilevel secure database manage- 
ment system for storage in the multilevel metadata- 



base at the appropriate security levels, the design 
tool thereby complementing the functions of the 
query processor so that said some of the constraints 
need not be processed by the query processor for 
performance enhancement; and 
a user interface which accepts query requests from 
the user and passes the query to the query proces- 
sor and accepts update requests from the user and 
passes it to the update processor if operating on- 
line or the user interface accepts the request from 
the user and passes it to the multilevel database 
management system if it is off-line, the user inter- 
face accepting the schema query request from the 
user and passes the query request to the multilevel 
secure database management system, the user inter- 
face further accepting the schema update requests 
from the user and passes it to the multilevel secure 
database management system. 

2. The apparatus of claim 1 wherein the query proces- 
sor comprises: 

a user interface manager to provide the user interface 

and to accept the query requests; 
a constraint manager to manage the security con- 
straints and the knowledge base; 
a query modifier which receives query requests from 
the user interface manager, examines the security 
constraints by communicating with the constraint 
manager and subsequently modifies the query 
which is evaluated against the multilevel secure 
database management system; 
a response processor which accepts the response 
from the multilevel secure database management 
system, examines the security constraints by com- 
municating with the constraint manager, and deter- 
mines which parts of the response are to be re- 
leased to the user; and 
a release database manager which manages the re- 
lease information and provides input to the query 
modifier and the response processor to carry out 
their functions. 

3. The apparatus of claim 1 wherein the update pro- 
50 cessor comprises: 

a user interface manager for communicating with the 
user; 

a constraint manager which manages the security 
constraints; 

a security level computer which communicates with 
the constraint manager and computes the security 
level of data to be updated; and 
a level upgrader which gets an input from the secu- 
rity level computer, creates an update process at 
the appropriate level and interfaces to the multi- 
level secure database management system. 

4. The apparatus of claim 1 wherein the multilevel 
database design tool is a monolithic module whose in- 
puts are the security constraints and initial schemas, and 

65 whose outputs are modified schemas and their security 
levels which can be entered into the multilevel 
metadatabase. 

* * * * * 
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ABSTRACT 



An associative (content addressable) optical memory 
system is comprised by a matched optical holographic 
filter (10, 18, 21, LI, L2) coupled to a digital computing 
system (14, 15, 16) having a memory (15, 16). In order to 
search the. computing system memory for occurrences 
of an item, a hologram of a binary representation of the 
item is formed in the Fourier transform plane (18) em- 
ploying laser (11) as the light source. A page of the 
memory to be searched is subsequently displayed at the 
input plane (10), which for example is comprised by a 
liquid crystal over silicon display, and illuminated by 
the laser (11). The light is filtered at the Fourier trans- 
form plane (18). Any occurrences in the input plane 
display of the item result in a respective correlation spot 
at the output plane, which spot output is decoded and 
supplied to the processor (14), thus providing correla- 
tion information to the computing system as to the loca- 
tion in its memory of the occurrences of the item. 

13 Claims, 3 Drawing Figures 
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little attention has been paid to the parallel processing of 

ASSOCIATIVE MEMORY SYSTEMS digital information by such techniques. 

Data stored in an immediate access memory of a Von 

BACKGROUND OF THE INVENTION Neumann computer can be organised in many ways 

™m . . . . , , „ . _ • „. r „*™<, 5 from one extreme of all data being at specified locations, 

This invention relates to associative memory systems, e r u t 

+~ „ , ^ , ,^ rf , oklo mammt ™a to t fl e other extreme of searching through the complete 

that is to say content addressable memory systems, ana ■ j ■* 

- , * • ** ^ +™t ™ „, data memory to find the required item or items. Most 

in particular to associative optical memory systems. ■ t ■ j * u ^ * ui * ;„ 

*^ software involving a data base of appreciable extent is 

SUMMARY OF THE INVENTION organised to restrict the search by arranging the data in 

. , n _ _ , ~ t « •,^„ t - „ 10 a logical manner, so that an algorithm can be written to 

According to one aspect or the present invention ° ° t _ . ^ „ 

A , * i j . . . *, „, reduce the search from a complete set of data items to a 

there is provided an associative optical memory system i^.-ri.- i. - ^ M u _ 

. f . . , . . _ • + v. «f « subset. Techniques such as arrays, indexes, buffers, 

comprising an optol jmagmg system in the form of a J ^ ^ 

matched optical holographic filter, and coupled Nevertheless, there is a limit to what 
thereto a digital computing system including a mem- I$ algorithms, and an de- 
ay, and wherein in use for searching the computing of P searchin ^ item by item for the required degree 
system memory for occurrences of an item the comput- ~ x t • r *• 
/ A ' , - . ' j is . * / of match is necessary in many applications, 
ing system controls the input and Founer transform Vor Neumann arc ^ ture the m is 

planesofthefilterandacoherentii^ d addressed location and ^ is a 
lei optical processing of the memory content, and he 20 b serial ^ Associative processors, on 
output plane of the filter provides information to the ^ ^ ^ can ^ bg & bdn 
computing system as to the location m the memory of characterised such that data ^ be found according to 
occurrences of said item. what th arfif father than ^ memory addresses at 
According ; to another aspect of the present invention which th m stOT ^ and sucfa ^ tions 
there is provided an associative optical memory system 25 ^ be rfonned over many sets of ^g^nts at the 
compnsmg a matched optical holographic filter, mciud- same dme with a sbgie instructioiL To do this effec . 
ing an input plane comprised by a liquid crystal over dthef yery Wgh d seriaI Qr paraM pf0cess , 
silicon display, a Founer transform plane and an output ing js ired ^ (or content address- 
plane, the planes being separated by thm spherical abie) st0 res were high speed serial, but there is a move 
lenses, and a coherent light source, and a digital com- 30 pf0cessing ^ arfays rf processors , 
puting system including a processor and a direct mem- ^ example of such an array c f processors is based on 
ory access element loaded by a backup store, which the transputer . Transputers are Von Neumann machines 
computing system is coupled to the filter for controlling that can be con nected in an array. Each machine has its 
the input and Fourier transform planes and the light Qwn memory and searching can be done in parallel 
source for parallel optical processing of the memory across ^ many mac bines as is desired. However, each 
content, the output plane providing information to the mac hine searches in a serial fashion through its own 
processor as to the location of occurrences in the mem- memory. Clearly the efficiency of this process is depen- 
ory of a searched for item, wherein in use of the mem- dent on t fc e distribution of the information requiring to 
ory system a page of the memory content is loaded into be searched between the processors in the array. If the 
the input plane, stored and the display blanked, a repre- search is localised to one or a few machines, the rest 
sentation of the item to be recognised is loaded into a may be held up waiting for the answers. If the inforroa- 
defined area of the display and a hologram thereof re- tion is unduly distributed this would imply that the 
corded in the Fourier transform plane by a pulsed refer- application is one which is suited to a wide distribution 
ence beam derived from the coherent light source, 45 0 f the information. These considerations may give to 
wherein the loaded page is then displayed and illumi- conflicting design constraints affecting the array har- 
nated by another beam derived from the coherent light n ess and the functional division between processors, 
source, wherein the light is filtered in the Fourier trans- Another approach is to use what is called "data flow 
form plane, detected at the output plane, decoded and processors" where the order of execution of the opera- 
passed to the processor for use in determining the loca- 50 tors of a program is determined solely by the data de- 
tion in the memory of the input display of the occur- pendencies. It is possible for the operations to be exe- 
rences. cuted in many different orders and, in particular, far 

rptpf nF^rwTPTTHM nv twf awtnhs more than one mstruction to be executed concurrently 

BRIEF DESCRIPTION OF THE DRAWINGS and faencej ^ ft ^ number Qf processorSj parallel . 

Embodiments of the invention will now be described 55 ism can be exploited. The applications of this approach 

with reference to the accompanying drawings in which: appear to be primarily high speed computation, rather 

FIG. 1 illustrates a matched filter double Fourier than data base management or information handling, 

transformation one-to-one imaging system (prior art); Reduction machines are another possibility of which 

FIG. 2 illustrates a possible representation of binary there are two basic varieties, Le. string and graph. The 

information, and 60 string reduction machines depend for their efficiency on 

FIG. 3 illustrates a system block diagram of an asso- organising the program and data in a tree structure, 

ciative optical memory. They are special purpose machines suited to applicative 

PRFFFRftFD languages. The graph reduction machine is also suited 

DESCRIPTION OF TlffiPREFERRED tQ applicative language and works in terms of packets. 

EMBODIMENTS 65 A p ac k et p GO i t^es the place of a random access mem- 
Optical holographic techniques give high potential ory and a multiprocessor system processes those pack- 
for parallel processing and while much attention has ets which contain information on the operations to be 
been paid to holography for mass storage of digital data, performed and the data to be used in that performance. 
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Most if not at all of the above approaches are tackling suited to read only rather than read/wnte storage. Al- 

compute bound problems rather than memory bound though both techniques involve lasers, holography is 

problems. Signal processing and arithmetic computa- much more dependent on a coherent light source. Hoio- 

tion are inherently compute bound (a large amount of graphic equipment is bulky because it usually involves 

processing on a relatively small amount of data). Data 5 lenses with long focal lengths and needs very precise 

retrieval in intelligent knowledge based systems can be and stable dimensioning. Direct recording needs even 

made into compute bound problems by complex linking more precise and stable dimensioning and usually m- 

of the data in, for example, a hierarchy tree structure, or volves rotating parts but the optical system is simpler, 

they can become memory bound if simpler structures The reason that direct recording has been used for 

involving more searching are invoked. All problems are 10 video recorders in the home entertainment market is 

memory bound while the data required is on a backing probably because of the simpler optics and the lack of 

store> suitable detectors to cope with highly parallel outputs. 

There is a case, therefore, for exploring solutions to these considerations do not weigh so heavily for data 

memory bound problems as well as compute bound processing. 

problems. While there is an element of competition 15 The present invention is not, however, concerned 

between the two approaches, they should be capable of with high density optical storage, for which there are 

being made to compliment each other. To simplify the many possibilities not necessarily using holography, but 

following, it will be assumed that the memory to be rather with the parallel processing of information using 

described interfaces with a conventional computer ar- holographic techniques, and carrying parallel process- 

chitecture, but that is not an overriding constraint. 20 ing one stage further than hitherto from the memory 

Memory organisation at present is largely dictated by towards the processor so that parallelism can be used to 

economic considerations and is hierarchical. In general, provide rapid search facilities for matching fields in 

the cost of storage decreases as the access time in- relatively complex data structures or for individual 

creases. Hence the hierarchy; tape, disc (or drum), mag- items. 

netic bubble, semiconductor, in increasing cost per bit 25 The basis of much work on optical pattern recogni- 

and decreasing access time. All of these memories are tion is the matched filter, sometimes called a "Vander 

serial access in terms of words, bytes or bits. There is no Lugt" filter, illustrated in FIG. 1. The filter contains 

memory that is parallel in terms of more than a few three planes 1, 2 arid 3 all of which are situated at the 

multiple words. foci of two thin spherical lenses LI and. L2. Plane 1 is 

The normal method of retrieving information that is 30 called the input plane, plane 2 is called the spatial Re- 
stored outside of the immediate access memory, is for quency plane or Fourier transform plane (FTP) and 
the central processor to request a peripheral to put into plane 3 is called the output plane. Lease LI acts to 
the immediate access memory a specified file, record or provide the Fourier transform of images in the input 
set of records. The central processor keeps some sort of plane 1 at the Fourier transform plane 2, and the lense 
index of the whereabouts of the relevant information 35 L2 acts to provide the inverse Fourier transform of 
and the peripheral usually searches and finds the exact images at the Fourier transform plane 2 and the output 
information and loads it. This can be done autono- plane 3. The Fourier transform of a real image at the 
mously if the processor has a DMA (direct memory input plane 1 which is illustrated by coherent light is an 
access) feature. interference pattern or hologram. The input plane 1 

Once loaded, a more detailed search is often neces- 40 contains spatial information in two dimensions, and the 

sary to find a file with a particular value in a set of data FTP, plane 2, contains spatial frequency mformation in 

items. Such a search might advantageously be carried two dimensions. Just as an electrical signal can be oro- 

out as a parallel operation over a number of fields simul- ken into frequency components by a Fourier transform, 

taneously. visual patterns can be treated likewise. Electrical signals 

Optical techniques are potentially capable of a very 45 are, however, usually one dimensional, whereas optical 

high level of parallel processing and have been used for information is two or three dimensional The concern 

some years in the processing and recognition of visual here is with two dimensional optical information, such 

analogue information. Optical techniques are also used as displayed on a page of print, for example, 

for the storage of digital informatin. However, optical The first stage in the production of a filter is to record 

techniques appear not to have been used for the recog- 50 a hologram in the FTP, plane2, corresponding to the 

nition of digital information. information to be recognised. To do this a representa- 

Optical storage of digital information may be divided tion of the information to be recognised (image 6) is 
into two broad categories: (a) using holographic tech- placed in the input plane 1, centred, say, at co and a 
niques and (b) using direct recording of binary informa- point source reference beam indicated by arrow 7 is 
tion. Much work has been done on the former with very 55 located at x=o, yi=-b. The coherent light source 
little commercial success. The latter, because of its com- used to illuminate the image 6 is coherent with the 
monality with home video recording, shows consider- reference beam 7 and hence differs from it only in am- 
able prospect of providing storage media which is a plitude and phase. The resultant hologram in the FTP is 
competitor to magnetic disc. Direct recording lends recorded, for example, on a photo sensitive medium as 
itself to serial read out or limited parallel read out of 60 a plane hologram transparency. The lens L2 and the 
information, whereas holography lends itself to full output plane 3 are not required for this process, 
parallel read out. Direct recording is a surface record- Having recorded the hologram of the information to 
ing. Holographic recording can be either a surface or a be recognised, it being disposed in the FTP, plane 2, a 
volume recording, and in the latter case can lead to very set of unknown patterns is placed in the input plane 1 
high information densities. Direct recording is very 65 and illuminated by coherent light of the same wave- 
sensitive to local imperfections in the vicinity of the length as is used in the recording process. Now, a basic 
recording medium (e.g. dust) whereas holography is property of a hologram is that if it is recorded by illumi- 
not Both approaches use recording media that are more nation of two scenes SI and S2 and then exposed to 
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light from SI, a real or virtual image of S2 will result, are of sufficient simplicity. Binary patterns can be re- 

and vice versa. In the matched filter a real image is garded as arrays of point sources. Such holograms may 

produced at the output plane 3. SI can be regarded as be generated once, stored in binary form and loaded as 

the point source, and S2 initially as the information required onto a suitable display. Alternatively they 

being recorded and latterly as the unknown pattern. If 5 could be computed at run time and loaded. However, 

the unknown pattern contains the information to be the time necessary to load the information, if this is done 

recognised, the result will be a representation of the serially may negate advantages obtained by subsequent 

point source in the output plane 3. This represents the parallel processing. 

autocorrelation, or matching, of the input to the filter. For the purposes of the following description the 
With suitable design, the autocorrelation spot will be 10 holograms will be assumed to be recorded by optical 
more intense than any cross correlation terms and a means in an undefined recording media. This being so, 
peak detector can determine its presence and position in together with the fact-that a very fast means of produc- 
the output plane. Its position in the output plane bears a tion \ s required, implies that (a) the hologram is re- 
one-to-one correspondence with its position in the input corded by a transient laser pulse of high energy and 
plane. For example, if its position in the input were to be 15 shor£ duration (microsecond or submicrosecond), (b) 
exactly the same as the position when the hologram was that hologram is read by a transient pulse which may be 
made, the spot would be at x=0, y 3 = -b (note the 0 f lower energy than the recording pulse, (c) the read is 
inversion of the axes). This can be regarded as the refer- pre f e rably but not necessarily non-destructive, (d) the 
ence point and the displacement of the spot from that hologram storage need not be permanent and can decay 
reference point is a direct measure of its displacement in 20 m of the Qrder of milUsecondS} ( e ) the duty cycle 
the input scene from the original position. of the laser> particularly on recording, is sufficiendy 
There are a number of strategies for making use of }qw ^ means r Ievel }iraitanons a re not ex- 
matched filter techniques for optical character recogm- ceeded> ^ (f) erasure md recording may be one opera . 
tion, as discussed for example in Optical Holography ^ Qf two d ding on the reC ording medium. 

£l J ' SS*5u C R Don ? M * h u H " ^ Academ,C 25 The parallel processing of a "page" of memory dis- 

Press 1971. The strategy depends the need to recognise: ^ ^ £ lme wi}{ now be considered> ^ 

(a) The position of items x u ft . . ■ *r m a scene made £ bd ivalent to the set of nriknown 

up from a set of n items with an arbitary numter of tte J r ^ red t0 ^ ^ entries on the « page » 

repetitions of each item (e.g. printed character ^^e^^^^-rfn^F^ttepu^rf 

« 5^ gm 10n ^' A +; „ ^ _ „„ 0 ;„ P+ 0 „ illustration n will be taken as 8, but in principle it can be 

of Lk detection). * recognisable and it should be possible to mask bits, 

(c) The presence of a number of examples of one item for «•»{*; °" e *** be interested in the last m bits of a 
from a set ofn items (e.g. searching for one binary 35 r^*?,^^^ 8 ^^^^^^ 
pattern in a set of n binary patterns). masked ° ut ™» « , thus *"» «»*«»» to * 
Although (a) is more general than (c), there are practi- recognised: a logical 0 a logical 1, a don t care and 
cal reasons why (even when the application requires it) ddmuters or a frame def,Iun S boundanes of a word- 
searching for items one at a time rather than many to- » » necessary to have an arrangement 4at will cope 
gether may be preferable. However, (c) implies a differ- 40 ™* * l P a ™ w, * out ahasm & that B there mus j 
ent matched filter for each item sought and it is then be a zero possibility of a pattern being generated and 
necessary to be able to change matched filters very registered correctly that is an unintentional representa- 
rapidly if this approach is to be adopted effectively. Oon of the one being sought, thus mplying a fairly high 

A basic problem of this type of optical processing is, order of redundancy. The prob km that a different pat- 
therefore, that while the processing is very fast once the 45 tern in the y dimension might alias the required pattern 
information is in the optical filter, actually getting it into in *e x direction can be avoided by recording and 
position may provide an unacceptable bottleneck. processing in the presence of a fixed but limited pattern, 

The most widely used holographic recording mate- possibly around the edge of the display, 

rial is silver halide photographic emulsion. This is satis- FIG. 2 shows an example of a word design m which 

factory from the point of view of resolution and con- 50 * computer word of 8 bits is represented by 27 bits on 

trast range but needs chemical processing and implies the display, the invention, however is not to be consid- 

complex mechanical handling problems if filters are to ered as limited to 8 bit words, any length may be used, 

be changed. A black dot represents the presence of illumination and 

There are many other possibilities, some of which do a clear dot its absence. The 8 bit words are arranged in 

not require chemical processing, for example ferro elec- 55 rows and columns, only four words being indicated in 

trie crystals, inorganic photochromic materials, ther- FIG. 2. Each word is comprised by three sub-rows and 

moplastic materials (require a developing process nine sub-columns and begins with a column word de- 

which is, however, fast by photographic standards), limiter (sub-column D) comprised by a sub-column of 

magneto-optic materials, ferroelectric photoconductor, two clear dots followed by a black dot as indicated, 

liquid crystal/photoconductor and thermally-written 60 Each first sub-row of a word is comprised by nine clear 

smectic liquid crystal. Our co-pending GB Application dots, thus acting as a row word delimiter. To facilitate 

No. 840482 (Ser. No. 699,980 filed 2/8/85) (W. A. Cros- understanding of FIG. 2 dashed lines have been drawn 

sland — R. W. A. Scarr 44-31) relates to optical proces- to delimit the four words thereof. The word occupying 

sors using a transmission type smectic liquid crystal the row 1, column lis (10101010) i.e. hexadecimal AA; 

display in the FTP. 65 that in row 1, column 2 is (MMMM1111) where M 

Recording holograms by optical means is not, how- means masked i.e. hexadecimal XF where X is any 

ever, the only possibility. It is also possible to generate value; that in row 2, column 1 is (00001 1 11) i.e. 0F and 

holograms by a computer if the patterns to be handled that in row 2, column 2 is (00001O1O) i.e. 0A. 
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The "page" of memory in the input plane can be Whereas semiconductor logic in memory arrangements 

loaded from a computer's electrical memory, in which are capable of such operations as searching for less than, 

case the computer will keep a map relating the contents or greater than, and performing logical and arithmetic 

of the input plane to its internal memory. Such a map operations on operands when found, an optical arrange- 

would be a simple linear translation from one medium 5 ment can only locate the relevant locations, however, it 

to the other. Alternatively, the page of memory may be can do these in combinations which the semiconductor 

the direct output of an optical (holographic) store pro- logic in memory arrangements is less well able to do. 

jected onto the input plane. If so, the information would For example, a set of personnel records might be 

be processed as described below, and then loaded into scanned as one operation to find ail the persons named 

an electrical memory if an item being sought was pres- 10 Brown with two children and red hair, 

ent in the page being displayed. A third possibility is the From consideration of the search times of a conven- 

direct loading of the input plane from a magnetic back- tional Von Neumann machine and of an optical correla- 

ing store using direct memory access (DMA). Here tor attached to a Von Neumann machine it is apparent 

there is the possibility of either loading an immediate that a very fast means of producing holograms, of the 

access memory afterwards if an item is found, or load- 15 microsecond order, is required; that the optical correla- 

ing the immediate access memory is parallel with load- tor is most disadvantaged when the information to be 
ing the input plane, Input displays that take the form of scanned is already in an immediate access memory and 

liquid crystal over silicon can store the information for a single scan of the material is required; that the optical 

display and allow simultaneous read access to it by a correlator is advantaged when the information is on a 

processor. 20 backing store and multiple passes or combinations of 

The interface between the backing store and the dis- words in fixed relationships are sought; and that the 
play (input plane) must allow for the fact that the binary optical correlator is further advantaged when the back- 
information needs to be displayed in a redundant format ing store material is already in an optical format. In 
and must provide the necessary translation arrange- drawing these conclusions it was assumed that, when 
ment. Direct interfacing to a holographic store implies 25 the information is on a backing store, the limitation is set 
that the information in that store would be in the same by the speed of reading that store rather than any limita- 
redundant format as is required for the display. By se- tion on loading the information into the input plane. It is 
lecting the appropriate fields (e.g. the bottom elements also assumed that processing of information at the out- 
of each row and the relevant columns in FIG. 2), the put plane can be done at speeds comparable with a 
information can be read out of a liquid crystal display 30 conventional processors memory cycle time, 
store as pure binary information. Compared with the resolution required for holo- 

The matched filter, which is to be put in the FTP, is graphic optical lenses, the resolution requirements of 

a recording of the word pattern sought together with its the recording in the FTP are not onerous. Furthermore, 

delimiters. If masking of words is required this is done there is no basic requirement to work in the visible 

on recording. The presentation of the word to be re- 35 range and operation in the infra red can be beneficial if 

corded can be placed at any arbitrary point on the dis- resolution of the holographic recording media is lim- 

play (input plane) although on the basis of symmetry it ited. 

is sensible to place the reference beam at the exact cen- There are a wide range of possible laser sources 
tre of the display and on the focal axis of the system which may be used for producing holograms in the 
(x=y=o). The intensity of the reference beam should 40 FTP. The parameters of a laser which need to be con- 
be much greater than that of the other individual point sidered are the peak power under pulsed operation and 
sources representing the binary numbers, in order that duty cycle, wavelength and spectral distribution of 
the autocorrelation terms be larger than the cross corre- energy. In general, a very highly coherent source is not 
lation terms at the output plane, compatible with high peak power outputs, and it is 
The output, plane may contain an array of photode- 45 considered that the high coherence required for mono- 
tectors or be comprised by a homogeneous photosensi- mode communications systems is not necessary. Consid- 
tive plane such as provided for a TV camera. If photo- eration of hologram resolution points to a source in the 
detectors are used, there is one for every word stored infra red rather than the ultra violet. Nd:YAG lasers 
on the input plane located at a position corresponding to provide outputs in the 1 u.m region and CO2 lasers in the 
the possible position of the words in the input plane. It 50 far infra red, about 10 jam. 

is assumed that word boundaries are fixed points at the Photodetection in an array or a homogenous photo- 
input plane and information is, therefore, fixed within a sensitive plate have been referred to above for use in the 
grid set by these word boundaries. If a photosensitive output plane for detecting correlation peaks. An array 
plate is used its photosensitive storage media would be of diodes is presently considered preferable since they 
scanned to locate the amplitude and position of maxima 55 are all solid state and the processing of their outputs is 
in the output plane. In either case a replica of the word relatively straightforward. The diode array may be 
being sought may be placed at a known point on the fabricated on a single chip. The density of the detectors 
input plane so as to act as a reference for determining would be relatively low, only one being needed per 
the level at which to set the correlation threshold. This word, so that chip utilisation would be poor. However, 
enables correction to be made automatically for varia- 60 the chip could also contain processing circuitry to pres- 
tion in the intensity of illumination of the input plane. ent correlation peak positions across a defined interface. 

The positions of valid maxima in the output plane are All of the diodes on the chip would need to have uni- 

conveyed to the processor which translates them into form sensitivity and whilst silicon would be ideal from 

addresses in its own memory, which may also be the a fabrication point of view the fall-off in its photosensi- 

input plane, where the items identified can be found and 65 tivity at the infra red end of the spectrum could be 

processed. disadvantageous. 

The only logical operation the arrangement is capa- The above description has covered various ap- 

ble of performing is "search on masked equality". proaches to the design of an associative optical mem- 



4,701,879 

9 10 

ory, that is to say an associative memory arrangment of size of the display (input plane) is limited if, as proposed, 
large capacity based on the use of a method holographic a liquid crystal over monocrystalline silicon display is 
filter connected to a digital computing system, a specific used. A four inch (10 cm) wafer of such material might 
system incorporating presently preferred options will allow some 50 square cm of useful area. If, as in British 
now be described with reference to the block diagram 5 Application No. 8208710, a transistor and a grid of 
of FIG, 3. connections are required behind each element in two 
An input plane is comprised by a liquid crystal over out of three rows, assuming the arrangement of FIG. 2, 
silicon display 10 incorporating a pleochroic dye and the minimum size of each element will be, say, 10 mi- 
working in a similar manner to that described in British cron square. This would give a potential for 5X10 7 
Patent Application No. 8209710 (Ser. No. 2118247A) 10 elements, corresponding to 1.4 xlO 7 bits of stored infor- 
(W. A. Crossland et al 35-13-7-5). A major difference, mation (8 bit words). 

however, is that the reference light source derived from The theoretical limit for storage in the FTP is of the 
the output of laser 11 is coupled by an optical fibre 12a, order of 18 X 10 7 elements, or 5.3 X 10 7 bits of stored 
12b through a hole in the centre of display 10. A means information, assuming light of a wavelength of I mi- 
of turning the reference light source on or off is re- 15 cron, no limitation set by hologram definition, the same 
quired and is illustrated as an optical modulator 13 in 50 square cm area for storage, and a plane hologram, 
series between fibre sections 12a and 12k The optical The practical limit for data page capacity is therefore 
modulator may be a solid state electro-optic device. determined by the input plane, where an ability to fabri- 
Alternatively the reference light source may be cate successively a number of elements of the 10 7 order 
switched on or off by some form of liquid crystal device 20 is beyond the current state of the art. Techniques in- 
employed as a shutter. The system includes lens LI and volving redundancy and selection on test as used on 
L2 equivalent to lenses 4 and 5, respectively, of FIG. 1. bubble memories may ease this problem, but are limited 
To record a hologram a processor 14 loads a repre- by the fact that information being searched for must 
sentation of the bit pattern to be recognised into a de- always have the same relative geometry irrespective of 
fined area of the input display 10. The rest of the input 25 its location on the display. 

display is blanked, its content (the input page of mem- If capacity is limited at the input plane rather than the 

ory to be searched) having been previously loaded, FTP, the area of the FTP can be less than that of the 

from backing store 15 via DMA 16, and stored. The input plane, implying an asymmetrical lens LI in FIG. 

optical modulator 13 is switched on via control line 17. 3, thus reducing the power requirements needed to 

If necessary the processor 14 sets the material in the 30 produce the hologram. 

FTP 18 and comprising a dynamic hologram recording The processor has the ability to access directly a 

medium into the "ready for record" state, as for exam- liquid crystal over silicon version of the input plane, 

pie in the case of a ferroelectric crystal, via control line which acts as its memory, as mentioned above, this 

19. The laser 11 under command by the processor 14 alternative being indicated by the dotted line "direct 
gives a high energy pulse (reference beam) to illuminate 35 processor access" 23 between processor 14 and plane 

the representation of the bit pattern to be recognised 10. 

(via lens L4, fibre 12a 9 12b and modulator 13) and the Thus the present invention proposes an associative 

hologram thereof is recorded in the FTP 18. The opti- optical memory system and the requirements of the 

cal modulator 13 is then switched off by command of component parts of such an arrangement have been 
the processor whereby to turn the reference beam off at 40 outlined. Such systems lend themselves to the parallel 

the display 10. processing of millions of bits in the search for a particu- 

On command from the processor the input page of lar item of information. The associative memory ccn- 

memory to be searched is displayed on display 10 and cept as outlined is largely complementary to advanced 

the laser 11 gives a low energy output pulse which via processor architecture and is relevant to 5th generation 

lens L4, lens L3 and beam splitter 20 illuminates the 45 computers with extensive knowledge/data bases, 

input page on display 10. Lenses L3 and L4 comprise a I claim: 

beam collimater. The light is filtered in the FTP 18 and 1. An associative optical memory system comprising 

detected by photosensitive matrix, e.g. a silicon diode an optical imaging system, a coherent light source and a 

array, at the output plane 21. The output information is digital computing system including a memory, the opti- 

decoded and passed back to the processor as correlation 50 cal imaging system being in the form of a matched opti- 

information 22, which processor 14 can then locate in cal holographic filter having an input plane, a Fourier 

the memory of the input display, the DMA 16 being transform plane and an output plane, the computing 

coupled to the processor as shown, the items that have system being coupled to the filter and including means 

been recognised. for controlling the input plane, the Fourier transform 

When the processor has finished processing theinfor- 55 plane and the coherent light source for parallel optical 

mation in the input plane, which may involve several processing of the memory content in order to search for 

cycles as outlined above, each cycle being employed for occurences of an item in the memory, the output plane 

locating a respective item, the input plane (display 10) is having an output providing information to the comput- 

reloaded with fresh material via the DMA 16 from the ing system controlling means as to the location in the 

backing store 15. The processor is free to carry out 60 memory of occurences of said item, said controlling 

other operations while the DMA Loading is taking means serving to form a hologram of said item in the 

place. Fourier transform plane, using said coherent light 

The data (memory) page capacity of the system can source as the light source for the hologram formation, 

be limited by optical resolution, or fabrication limita- and the controlling means serving subsequently to dis- 

tions, in any one of the three planes i.e. the input, the 65 play the memory content at the input plane and to illu- 

Fourier transform and the output plane. However, be- minate it with said coherent light source, the Fourier 

cause- the output plane resolution that is required is one transform plane serving to filter light from the input 

bit per word, this is unlikely to be the limitation. The plane, any occurence in the input plane display of said 
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item resulting in a corresponding output in the output 10. A memory system as claimed m claim 9, wherein 

plane which is detected and provided as said informa- the binary patterns are defined with a redundant coding 

tion to the computing system means. th»t permits discrimination without abasing. 

2. A memory system as claimed in claim 1, wherein a A memory system as churned in claim 9, wherein 
processorofthecomputingsystemcomprisessaidcom- 5 each binary pattern comprises a computer word of 8 

. * «• a bits represented at the input plane as 27 bits comprised 

putmg system controlling means, and wherem said ^^^^^^^r^^fMc^ 

memory includes a direct memory access element 0 f which serve to deiimit words from one another, 

loaded by a backing store. 12. A memory system as claimed in claim 11 and 

3. A memory system as claimed m claim 1, wherein 1Q wherein the co i urnns other than the first represent logi- 
the coherent light source is comprised by a laser the cal 0j logical one or don > t care ( mas ked). 

optical output of which can be supplied to illuminate 23 An associative optical memory system comprising 

the input plane or which can be supplied as a reference a matched optical holographic filter, including an input 

beam to the input plane. plane comprised by a liquid crystal over silicon display, 

4. A memory system as claimed in claim 3, wherein 15 a Fourier transform plane and an output plane, the 
the reference beam is supplied to the input plane via an planes being separated by thin spherical lenses, and a 
optical fibre including an optical modulator therein. coherent light source, and a digital computing system 

5. A memory system as claimed in claim 1 wherein including a processor and a direct memory access ele- 
the input plane is comprised by a liquid crystal over ment loaded by a backup store, which computing sys- 
silicon display 20 tem is coupled to the filter for controlling the input and 

6. A memory system as claimed in claim 1, wherein Fourier transform planes and the light source for paral- 
the input plane is comprised by a liquid crystal over *f optical processing of the memory content, the output 
sUicondisplaywhichalsocomprisesthedigitalcomput- planeprovi ding information t 0 the ^^ 

location of occurrences m the memory of a searched for 
mg system memory „, w - 25 item, wherein in use of the memory system a page of the 

7. A memory system as claimed m c ton 1 wherem m . % ^ ^ ^ . stored 

the Fourier transform plane is compnsed by ferroelec. ^ ^ & l a representation of the item to 

trie crystal or thermally written smectic liquid crystal. ^ recognised is loaded into a defmed m of the disp i ay 

8. A memory system as claimed in claim 1, wherein md a hologram thereo f recorded in the Fourier trans- 
the output plane is comprised by an array of photodi- 30 form phme 5y a pulsed re f erence beam derived from the 
odes. coherent light source, wherein the loaded page is then 

9. A memory system as claimed in claim 1, wherein displayed and illuminated by another beam derived 
said item is comprised as a specific binary pattern or f rom the coherent light source, wherein the light is 
patterns, wherein the memory is configured in the input filtered in the Fourier transform plane, detected at the 
plane as a plurality of binary patterns, and wherein said 35 output plane, decoded and passed to the processor for 
matched filter serves to locate the specific binary pat- use in determining the location in the memory of the 
tern or patterns against a background of the plurality of input display of the occurrences. 

binary patterns. ***** 
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ABSTRACT 
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1162336 2/1984 Canada 364/412 



In a telecommunications system, automatic directory assis- 
tance uses a voice processing unit comprising a lexicon of 
lexemes and data representing a predetermined relationship 
between each lexeme and calling numbers in a locality 
served by the automated directory assistance apparatus. The 
voice processing unit issues messages to a caller making a 
directory assistance call to prompt the caller to utter a 
required one of said lexemes. The unit detects the calling 
number originating a directory assistance call and, respon- 
sive to the calling number and the relationship data com- 
putes a probability index representing the likelihood of a 
lexeme being the subject of the directory assistance call. The 
unit employs a speech recognizer to recognize, on the basis 
of the acoustics of the caller's utterance and the probability 
index, a lexeme corresponding to that uttered by the caller. 

20 Claims, 6 Drawing Sheets 
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METHOD AND APPARATUS FOR 
AUTOMATION OF DIRECTORY 
ASSISTANCE USING SPEECH 
RECOGNITION 

5 

BACKGROUND OF THE INVENTION 

1. Technical Field 

The invention relates to a method and apparatus for 
providing directory assistance, at least partially automati- 10 
cally, to telephone subscribers. 

2. Background Art 

In known telephone systems, a telephone subscriber 
requiring directory assistance will dial a predetermined 
telephone number. In North America, the number will typi- 15 
cally be 411 or 555 1212. When a customer makes such a 
directory assistance call, the switch routes the call to the first 
available Directory Assistance (DA) operator. When the call 
arrives at the operator's position, an initial search screen at 
the operator's terminal will be updated with information 20 
supplied by the switch, Directory Assistance Software 
(DAS), and the Operator Position Controller (TPC). The 
switch supplies the calling number* the DAS supplies the 
default locality and zone, and the TPC supplies the default 
language indicator. While the initial search screen is being 25 
updated, the switch will connect the subscriber to the 
operator. 

When the operator hears the "customer-connected" tone, 
the operator will proceed to complete the call. The operator 3Q 
will prompt for locality and listing name before searching 
the database. When a unique listing name is found, the 
operator will release the customer to the Audio Response 
Unit (ARU), which will play the number to the subscriber. 

Telephone companies handle billions of directory assis- 35 
tance calls per year, so it is desirable to reduce labour costs 
by minimizing the time for which a directory assistance 
operator is involved. As described in U.S. Pat No. 5,014,303 
(Velius) issued May 7, 1991, the entire disclosure of which 
is incorporated herein by reference, a reduction can be 40 
achieved by directing each directory assistance call initially 
to one of a plurality of speech processing systems which 
would elicit the initial directory assistance request from the 
subscriber. The speech processing system would compress 
the subscriber's spoken request and store it until an operator 45 
position became available, whereupon the speech processing 
system would replay the request to the operator. The com- 
pression would allow the request to be replayed to the 
operator in less time than the subscriber took to utter it 
Veiius mentions that automatic speech recognition also 50 
could be employed to reduce the operator work time. In a 
paper entitled "Multiple-Level Evaluation of Speech Rec- 
ognition Systems", the entire disclosure of which is incor- 
porated herein by reference, John F. Pitrelli et al disclose a 
partially automated directory assistance system in which 55 
speech recognition is used to extract a target word, for 
example a locality name, from a longer utterance. The 
system strips off everything around the target word so that 
only the target word is played back to the operator. The 
operator initiates further action. 60 

U.S. Pat. No. 4,797,910 (Daudelin) issued Jan. 10, 1989, 
the entire disclosure of which is incorporated herein by 
reference, discloses a method and apparatus in which opera- 
tor involvement is reduced by means of a speech recognition 
system which recognizes spoken commands to determine 65 
the class of call and hence the operator to which it should be 
directed. The savings to be achieved by use of Daudelin 1 s 
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speech recognition system are relatively limited, however, 
since it is not capable of recognizing anything more than a 
few commands, such as "collect", "calling card", "operator 1 ', 
and so on. 

These known systems can reduce the time spent by a 
directory assistance operator in dealing with a directory 
assistance call, but only to a very limited extent. 

SUMMARY OF THE INVENTION 

The present invention seeks to eliminate, or at least 
mitigate, the disadvantages of the prior art and has for its 
object to provide an improved new directory assistance 
apparatus and method capable of reducing, or even elimi- 
nating, operator involvement in directory assistance calls. 

To this end, according to one aspect of the present 
invention, there is provided directory assistance apparatus 
for use in a telephone system, comprising a voice processing 
unit having at least one lexicon of lexemes potentially 
recognizable by the unit and data representing a predeter- 
mined relationship between each of said lexemes and each 
of a plurality of call sources in an area served by the 
directory assistance apparatus. The unit also has means for 
issuing messages to a caller making a directory assistance 
call to prompt the caller to utter a required one of the 
lexemes, and means for detecting an identifier, for example 
a portion of a calling number, for the call source from 
whence the directory assistance call was received, means 
responsive to the identifier detected and to the data for 
computing a probability index for each lexeme representing 
the likelihood of that particular one of said lexemes being 
that uttered by the caller, and speech recognition means for 
selecting from the lexicon, on the basis of the acoustics of 
the caller's utterance and the probability index, a lexeme 
corresponding to that uttered by the caller. 

A lexeme is a basic lexical unit of a language and 
comprises one or several words, the elements of which do 
not separately convey the meaning of the whole. 

Preferably, the voice processing unit has several lexicons, 
each comprising a group of lexemes having a common 
characteristic, for example name, language, geographical 
area, and the speech recognition means accesses the lexicons 
selectively in dependence upon a previous user prompt. 

Computation of the probability index may take account of 
a priori call distribution. A priori call distribution weights the 
speech recognition decision to take account of a predeter- 
mined likelihood of a particular locality containing a par- 
ticular destination being sought by a caller. The apparatus 
may use the caller's number to identify the locality from 
which the caller is making the call. 

The probability index might bias the selection in favour 
of, for example, addresses in the same geographical area* 
such as the same locality. 

In preferred embodiments of the present invention the 
voice processing unit elicits a series of utterances by a 
subscriber and, in dependence upon a listing name being 
recognized, initiates automatic accessing of a database to 
determine a corresponding telephone number. 

The apparatus may be arranged to transfer or "deflect" a 
directory assistance call to another directory assistance 
apparatus when it recognizes that the subscriber has uttered 
the name of a locality which is outside its own directory 
area. In such a situation, the above-mentioned predeter- 
mined relationship between the corresponding lexeme and 
the call source is that the lexeme relates to a locality which 
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is not served by the apparatus. 

Thus, embodiments of the invention may comprise means 
for prompting a subscriber to specify a locality, means for 
detecting a place name uttered in response, means for 
comparing the uttered place name with the lexicon and in 5 
dependence upon the results of the comparison selecting a 
message and playing the message to the subscriber. If the 
place name has been identified precisely as a locality name 
served by the apparatus, the message may prompt the caller 
for more information. Alternatively, if the locality name is jq 
not in the area served by the apparatus, the message could 
be to the effect that the locality name spoken is in a different 
calling or directory area and include an offer to give the 
subscriber the directory assistance number to call. In that 
case, the speech recognition system would be capable of 15 
detecting a positive answer and supplying the appropriate 
area code. Another variation is that the customer could be 
asked if the call should be transferred to the directory 
assistance operator in the appropriate area. If the subscriber 
answered in the affirmative, the system would initiate the 2 q 
call transfer. 

The caller's responses to the speech recognition system 
may be recorded. If the system disposed of the call entirely 
without the assistance of the operator, the recording could be 
erased immediately. On the other hand, if the call cannot be 25 
handed entirely automatically, at the point at which the call 
is handed over to the operator, the recording of selected 
segments of the subscriber's utterances could be played back 
to the operator. Of course, the recording could be com- 
pressed using the prior art techniques mentioned above. 30 

According to a second aspect of the invention, a method 
of at least partially automating directory assistance in a 
telephone system using directory assistance apparatus com- 
prising a voice processing unit having a lexicon of lexemes 
potentially recognizable by the unit and data representing a 35 
predetermined relationship between each of the lexemes and 
a calling number in an area served by the automated direc- 
tory assistance apparatus, comprises the steps of: 

issuing messages to a caller making a directory assistance 
call to prompt the caller to utter one or more utterances, 40 
detecting an identifier, such as a calling number originating 
a directory assistance call, computing, in response to the 
identifier and said data, a probability index for each lexeme 
representing the likelihood that the lexeme will be selected, 
and employing speech recognition means to recognize, on 45 
the basis of the acoustics of the caller's utterance and the 
probability index, a lexeme corresponding to that uttered by 
the caller. 

Preferably, the voice processing unit has several lexicons, 5Q 
each having lexemes grouped according to certain charac- 
teristics e.g. names, localities, languages and the method 
includes the steps of issuing a series of messages and 
limiting the recognition process to a different one of the 
lexicons according to the most recent message. 55 

The various objects, features, aspects and advantages of 
the present invention will become more apparent from the 
following detailed description, in conjunction with the 
accompanying drawings, of a preferred embodiment of the 
invention. 60 

BRIEF DESCRIPTION OF DRAWINGS 

FIG. 1 is a general block diagram of a known telecom- 
munications system; 

HG. 2 is a simplified block diagram of parts of a 65 
telecommunications system employing an embodiment of 
the present invention; 



4 

FIGS. 3A and 3B are a general flow chart illustrating the 
processing of a directory assistance call in the system of 
FIG. 2; 

FIG. 4 is a chart illustrating the frequency with which 
certain localities are requested by callers in the same or other 
localities; and 

FIG. 5 is a graph of call distribution according to distance 
and normalized for population of the called locality. 

BEST MODE FOR CARRYING OUT THE 
INVENTION 

FIG. 1 is a block diagram of a telecommunications system 
as described in U.S. Pat. No. 4,797,910. As described 
therein, block 1 is a telecommunications switch operating 
under stored program control. Control 10 is a distributed 
control system operating under the control of a group of data 
and call processing programs to control various parts of the 
switch. Control 10 communicates via link 11 with voice and 
data switching network 12 capable of switching voice and/or 
data between inputs connected to the switching network. An 
automated voice processing unit 14 is connected to the 
switching network 12 and controlled by control 10. The 
automated voice processing unit receives input signals 
which may be either voice or dual tone multifrequency 
(DTMF) signals and is capable of determining whether or 
not the DTMF signals are allowable DTMF signals and 
initiating action appropriately. In the system described in 
U.S. Pat. No. 4,797,910, the voice processing unit is capable 
of m^tmguishing between the various elements of a prede- 
termined list of spoken responses. The voice processing unit 
14 also is capable of generating tones and voice messages to 
prompt a customer to speak or key information into the 
system for subsequent recognition by the speech recognition 
system. In addition, the voice processing unit 14 is capable 
of recording a short customer response for subsequent 
playback to a human operator. The voice processing unit 14 
generates an output data signal, representing the result of the 
voice processing. This output data signal is sent to control 10 
and used as an input to the program for controlling estab- 
lishment of connections in switching network 12 and for 
generating displays for operator position 24 coupled to the 
network 12 via line 26. In order to set up operator assistance 
calls, switch 1 uses two types of database system. Local 
database 16 is directly accessible by control 10 via switching 
network 12, Remote database system 20 is accessible to 
control 10 via switching network 12 and interconnecting 
data network 18. A remote database system is typically used 
for storing data that is shared by many switches. For 
example, a remote database system might store data per- 
taining to customers for a region. The particular remote 
database system 20 that is accessed via data network 18 
would be selected to be the remote database associated with 
the region of the called terminal. Intercormecting data net- 
work 18 can be any well known data network and specifi- 
cally could be a common channel signalling system such as 
the international standard telecommunications signalling 
system CCS 7. 

A transaction recorder 22, connected to control 10, is used 
for recording data about calls for subsequent processing. 
Typically, such data is billing data. The transaction recorder 
22 is also used for recording traffic data in order to engineer 
additions properly and in order to control traffic dynami- 
cally. 

The present invention will be employed in a telecommu- 
nications system which is generally similar to that described 
in U.S. Pat. No. 4,797,910. FIG. 2 is a simplified block 
diagram of parts of the system involved in a directory 
assistance call, corresponding parts having the same refer- 
ence numbers in both FIG. 1 and FIG. 2. As shown in FIG. 
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2, block 1 represents a telecommunications switch operating 
under stored program control provided by a distributed 
control system operating underthe control of a group of data 
and call processing programs to control various parts of the 
switch. The switch 1 comprises a voice and data switching 5 
network 12 capable of switching voice and/or data between 
inputs and outputs of the switching network. As an example, 
FIG. 1 shows a trunk circuit 31 connected to an input of the 
network 12. A caller's station apparatus or terminal 40 is 
connected to the trunk circuit 31 by way of network routing/ 
switching circuitry 30 and an end office 33. The directory 
number of the calling terminal, identified, for example, by 
automatic number identification, is transmitted from the end 
office switch 33 connecting the calling terminal 40 to switch 
1. 

An operator position controller 23 connects a plurality of 15 
operator positions 24 to the switch network 12. Each opera- 
tor position 24 comprises a terminal which is used by an 
operator to control operator assistance calls. Data displays 
for the terminal are generated by operator position controller 
23. A data/voice link 27 connects an automated voice 20 
processing unit 14A to the switching network 12. The 
automated voice processing unit 14A will be similar to that 
described in US patent number 4,797,910 in that it is capable 
of generating tones and voice messages to prompt a cus- 
tomer to speak or key dual tone multifrequency (DTMF) 25 
information into the system, determining whether or not the 
DTMF signals are allowable DTMF signals, initiating action 
appropriately and applying speech recognition to spoken 
inputs. In addition, the voice processing unit 14 A is capable 
of recording a short customer response for subsequent 30 
playback to a human operator. 

Whereas, in U.S. Pat. No. 4,797,910, however, the voice 
processing unit 14 merely is capable of distinguishing 
between various elements of a very limited list of spoken 
responses to determine the class of the call and to which 35 
operator it should be directed, voice processing unit 14A of 
FIG. 2 is augmented with software enabling it to handle a 
major part, and in some cases all, of a directory assistance 
call. 

In order to provide the enhanced capabilities needed to 4Q 
automate directory assistance calls, at least partially, the 
voice processing unit 14A will employ flexible vocabulary 
recognition technology and a priori probabilities. For details 
of a suitable flexible vocabulary recognition system the 
reader is directed to Canadian patent application number 
2,069,675 filed May 27, 1992 and laid open to the public 45 
Apr. 9, 1993, the entire disclosure of which is incorporated 
herein by reference. 

A priori probability uses the calling number to determine 
a probability index which will be used to weight the speech 
recognition result. The manner in which the a priori prob- 50 
abilities are determined will be described in more detail later 
with reference to FIGS. 4 and 5. 

While it would be possible to use a single lexicon to hold 
all of the lexemes which it is capable of recognizing, the 
voice processing unit 14A has several lexicons, for example, 55 
a language lexicon, a locality lexicon, a YES/NO lexicon 
and a business name lexicon. Hence, each lexicon comprises 
a group of lexemes having common characteristics and the 
voice processing unit 14A will use a different lexicon 
depending upon the state of progress of the call, particularly 60 
the prompt it has just issued to the caller. 

As shown in FIGS. 3A and 3B, in embodiments of the 
present invention, when the voice processing unit 14A 
receives a directory assistance call, it determines in step 301 
whether or not the number of the calling party is known. If 65 
it is not, the voice processing unit immediately redirects the 
call for handling by a human operator in step 302. If the 



6 

calling number is known, in step 303 the voice processing 
unit 14A issues a bilingual greeting message to prompt the 
caller for the preferred language and compares the reply 
with a lexicon of languages. At the same time, the message 
may let the caller know that the service is automated, which 
may help to set the caller in the right frame of mind. 
Identification of language choice at the outset determines the 
language to be used throughout the subsequent process, 
eliminating the need for bilingual prompts throughout the 
discourse and allowing the use of a less complex speech 
recognition system. If no supported language name is 
uttered, or the answer is unrecognizable, the voice process- 
ing unit 14 A hands off the call to a human operator in step 
304 and plays back to the operator whatever response the 
caller made in answer to the prompt for language selection. 
It will be appreciated that the voice processing unit 14A 
records the caller* s utterances for subsequent playback to the 
operator, as required. 

If the caller selects French or English, in step 305 the 
voice processing unit 14A uses the calling number to set a 
priori probabilities to detennine the likelihood of the locality 
names in the voice processing unit's locality lexicon being 
requested. The locality lexicon comprises the names of 
localities it can recognize, as well as a listing of latitudes and 
longitudes for determining geographical distances between 
localities and calling numbers. In step 305, the voice pro- 
cessing unit 14A computes a priori probabilities for each 
lexeme in the locality lexicon based upon (i) the population 
of the locality corresponding to the lexeme; (ii) the distance 
between that locality and the calling number, and (iii) 
whether or not the calling number is within that locality. The 
manner in which these a priori probabilities can be deter- 
mined will be described more fully later. 

In step 306, the voice processing unit 14A issues the 
message "For what city?" to prompt the caller to state the 
name of a locality, and tries to recognize the name from its 
locality lexicon using speech recognition based upon the 
acoustics, as described in the afore-mentioned Canadian 
patent application number 2,069,675. The voice processing 
unit will also use the a priori probabilities to influence or 
weight the recognition process. If the locality name cannot 
be recognized, decision steps 307 and 308 cause a message 
to be played, in step 309, to prompt the caller for clarifica- 
tion. The actual message will depend upon the reason for the 
lack of recognition. For example, the caller might be asked 
to speak more clearly. Decision step 308 permits a limited 
number of such attempts at clarification before handing the 
call off to a human operator in step 310. The number of 
attempts will be determined so as to avoid exhausting the 
caller's patience. 

If the locality name is recognized, the voice processing 
unit 14A determines in step 311 whether or not the locality 
is served by the directory assistance office handling the call. 
If it is not, the voice processing unit will play a "deflection" 
message in step 12 inviting the caller to call directory 
assistance for that area. It is envisaged that, in some embodi- 
ments of the invention, the deflection message might also 
give the area code for that locality and even ask the caller if 
the call should be transferred. It should be appreciated that, 
although some localities for other areas are in the lexicon, 
and hence recognizable, there is no corresponding data 
relating them to the calling numbers served by the apparatus 
since the apparatus cannot connect to them. The "predeter- 
mined relationship" between the localities for other areas 
and the calling numbers is simply that they are not available 
through the automated directory assistance apparatus which 
serves the calling numbers. 
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If the requested locality is served by the directory assis- 
tance office handling the call, in step 313 the voice process- 
ing unit will transmit a message asking the caller to state 
whether or not the desired listing is a business listing and 
employ speech recognition and a YES/NO lexicon to rec- 5 
ognize the caller's response. If the response cannot be 
recognized, decision steps 314 and 315 and step 316 will 
cause a message to be played to seek clarification. If a 
predetermined number of attempts at clarification have 
failed to elicit a recognizable response, decision step 315 i0 
and step 317 hand the call off to a human operator. If a 
response is recognized in step 314, decision step 318 (FIG. 
38) determines whether or not a business was selected. If 
not, step 319 plays the message "For what listing?" and, 
once the caller's response has been recorded, hands off to the 15 
human operator. 

If decision step 318 indicates that the required number is 
a business listing, in step 320 the voice processing unit 14A 
plays a message "For what business name?" and employs 
speech recognition and a business name lexicon to recognize 20 
the business name spoken by the caller in reply. Once again, 
the recognition process involves an acoustic determination 
based upon the acoustics of the response and a priori 
probabilities. 

If the business name cannot be recognized, in steps 25 
321,322 and 323 the unit prompts the caller for clarification 
and, as before, hands off to a human operator in step 324 if 
a predetermined number of attempts at clarification fail. 

It should be noted that, when the unit hands off to a human 
operator in step 310, 317, 319 or 324, the operator's screen 30 
will display whatever data the automatic system has man- 
aged to determine from the caller and the recording of the 
caller's responses will be replayed. 

If the unit recognizes the business name spoken by the 35 
caller, in step 325 the unit determines whether or not the 
database 16 lists a main number for the business. If not, the 
unit hands off to the human operator in step 326 and 
language, locality and selected business will be displayed on 
the operator's screen. If there is a main number for the ^ 
business, in step 327 the unit plays a message asking if the 
caller wants the main number and uses speech recognition to 
determine the answer. If the caller's response is negative, 
step 328 hands off to the human operator. If the caller asks 
for the main number, however, in step 329 the unit instructs 45 
the playing back of the main number to the caller, and 
terminates the interaction with the caller. 

As mentioned earlier, the use of a priori probabilities 
enhances the speech recognition capabilities of the voice 
processing unit 14A. Determination of a priori probabilities 50 
for locality names (step 305) will now be described. (A 
priori probabilities for other lexicons can be determined in 
a similar manner appropriate to the "predetennined relation- 
ship" for that lexicon.) 

Statistics collected from directory assistance data show a 55 
relation between the origin of a call and its destination. An 
a priori model of probability that a person in a particular 
numbering plan area NPA and exchange NXX, i.e. with a 
phone number (NPA)NXX . . . , will ask for a destination 
locality d, is an additional piece of information which 60 
improves the recognition performance. The a priori model 
expresses the probability P(dlo) of someone requesting a 
destination locality d given that they are making the direc- 
tory assistance call from an originating locality 0. The 
probability p(dlo) depends upon the population of destina- 65 
tion locality d and the distance between destination locality 
d and originating locality o. The originating locality 0 from 
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which the directory assistance call originates is not known 
precisely. From the originating phone number (NPA)NXX . 
. . , the Central Office (CO) is identified using the Bellcore 
mapping. Following that step, a set of possible originating 
localities associated with that Central, Office is considered. 
The probability of a caller requesting directory assistance for 
a destination locality d from a phone number (NPA)NXX . 
. . in an originating locality o is: 

P(mPA)NXX)= 2 P(o)P(d\o) 

0€.CO 

The probability P(o) of each originating locality o asso- 
ciated with a CO originating a call is proportional to its 
population. Finally, the total recognition score for each 
locality is a weighted sum of the usual acoustic likelihood 
logP(YiY 2 . . . Y^Jd) and the logarithm of 
P(d)lcaihng(NPA)NXX: 

Scorc(d)=log P(Y I Y 2 . . . Y*ld>f)Alog P(d!calling(NPA)NXX)(Eq2) 

An a priori model may be arranged to distinguish between 
populations having French or English as their first language. 
Knowing the language selected by the user, the population 
using that language is used to estimate P(dlo). A minirnum 
value of 10% of the population is used to avoid excessively 
penalizing a language. 

An example of an a priori probability model developed 
using directory assistance data collected in the 514, 418 and 
819 area codes is shown graphically in HG. 4. In each of 
these area codes, the number of requests to and from each 
exchange (NXX) was collected; faint lines appear indicating 
the frequency of "any city requesting Montreal"; "Montreal 
requesting any city"; and "any city requesting itself*. From 
these data it was possible to estimate the parameters of a 
parametric model predicting the probability of a request for 
information being made for any target locality given the 
calling (NPA)NXX. The parameters of the model proposed 
are the destination locality's population and the distance 
between the two localities. Where o is the originating 
locality, d is the destination locality, and S is the size of a 
locality, then the likelihood of a request about locality d 
given locality o is 

L(dlo>=S(d)*f( ) 

The normalized likelihood is 
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UtRo) = 0.60 

over ail if 

where d' denotes supported localities. 

When the destination locality is also the originating 
locality, the likelihood is higher, so this is treated as a special 
case. 

It is assumed that 60% of directory assistance (DA) 
requests are placed to localities including the originating 
locality as governed by the equation above, and an addi- 
tional 40% of DA requests are for the originating locality, 
giving 

P(<Ro) -UM> d*o 
= Z(dfo) + a4Q, d = o 

Intuitively, the probability P(dto) is a function f(o,d) which 
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varies inversely with the distance between localities. In 
order to better define the function, a table of discrete values 
for certain distance ranges was derived from community of 
interest data collected in the three Quebec area codes. The 
distance units used in this section are the ones used by 5 
Bellcore to maintain geographical locality coordinates in 
North America. One kilometre is roughly equal to 1.83 
Bellcore units. The discrete function values f computed for 
a given distance range in the province of Quebec are given 
in the Table below for each area code. Since the goal was to 
obtain an a priori model for the entire province, the values 
for f(o,d) were computed for the province as a whole 
through factoring in the probability of a call originating in 
each area code. This was estimated to be in proportion to the 
number of exchanges [(NPA)NXX] per area code [NPA] 
relative to the province as a whole. 

This gave F{Province}={a40f(514)}+{0.27f(819)}+ 
{0.33f(418}} 



distance 


514 


819 


418 


Province 


0-25 


L0 


1.0 


1.0 


1.00 


26-50 


0.9 


03 


0.7 


0.67 


51-75 


0.4 


0.0 


0.2 


0.23 


76-100 


0.1 


0.0 


0.3 


0.14 


101-125 


0.1 


0.0 


0.1 


0.07 


126-150 


0.1 


0.0 


0.1 


0.07 


151-175 


0.O 


0.0 


0.2 


0.07 


176-200 


0.0 


0.0 


0.0 


0.00 


>200 


0.0 


0.0 


0.0 


0.00 



Given the sparseness of data, the model for obtaining 
weights as a function of distance was converted from 
nonpararnetric to parametric. For this purpose, a least square 
fit was performed on the data from ranges 26-50 to 
151-175. The distance value used in the fitting process was 35 
the median distance in each range. An analysis of various 
function forms for the regression showed that the form 
below provided the closest fit to the collected data: 

f(distance)={A/(fistanceH-B 40 

The best coefficients obtained from the analysis were 

A=33.665 

B=-0.305 45 

This function F reaches zero when the distance is equal to 
196 units. In order not to eliminate the possibility of 
handling a DA request when the distance was greater than 
this value, the function was modified to accommodate 50 
distances of up to twice the maximum distance between any 
pair of localities with population 10,000 or greater in the 
province. The two most distant localities that matched this 
criteria were RouynNoranda and Gaspe at a distance of 
2, 1 03 units. The maximum distance at which f would be zero 55 
was set to be 4,207 distance units. The function switches to 
a negative slope linear function at the point where the 
predicted value of f is 0.01. This corresponds to a distance 
value of 167. 

The final f becomes 60 
Hie fit of this model to the collected data, labelled 
"nonpararnetric model", is shown in FIG. 5. 

In order to determine the effects of the a priori model on 
recognition rate, the model was applied to simulated DA 
requests, and each token in the test set was re-scored to take 65 
a priori likelihood into account. The function used for 
re-scoring was 
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weighted score nas=+K log {P(old)}, 

where nas is the normalized acoustic score, the acoustic 
score over the number of frames in the utterance. The 
proportionality constant K was trained to maximize the 
recognition rate over the province of Quebec. The distribu- 
tion of tokens in the set used for development was normal- 
ized to be that predicted by the a priori model. For this 
reason a correctly recognised simulated DA request from a 
locality to the same locality carries more weight when 
computing recognition rate than does a request for a small 
distant locality with a correspondingly low a priori prob- 
ability. A recognition rate was thus determined per locality 
and then the overall provincial recognition rate was com- 
puted by taking the sum of the rate for all localities in 
proportion to their respective populations. The only assump- 
tion made in applying the model was that the calling 
(NPA)NXX was known, which allowed the utterance to be 
re-scored by mapping it to all localities corresponding to the 
given entry in the Bellcore database. 

The a priori model was further refined in order to avoid 
fayouting the bigger localities unduly, as the recognition rate 
for these bigger localities, based on acoustics alone, was 
already above average. For this purpose, constants were 
introduced in the model corresponding to population ranges 
over the target localities in order to reduce the effective 
populations. These constants were not applied to the mod- 
elled distribution of requests since this would invalidate the 
method for computing the provincial recognition rate. The 
function defining likelihood becomes 

Udlo>=K 7<d) S(d)f(!) 

where r(d) is a range of destination locality population for 
which the constant K applies. The best ranges and their 
associated constants were then determined empirically from 
a development set. 

Thus, using a priori call distribution, and flexible vocabu- 
lary recognition, embodiments of the present invention are 
capable of automating at least the front end of a directory 
assistance call and in some cases the entire call. 

The embodiment of the invention described above is by 
way of example only. Various modifications of, and alter- 
natives to, its features are possible without departing from 
the scope of the present invention. For example, the voice 
processing unit might be unilingual or multilingual rather 
than bilingual. 

The probability index need not be geographical, but might 
be determined in other ways, such as temporal, perhaps 
according to time-of-day, or week or season-ofyear. For 
example, certain businesses, such as banks, are unlikely to 
be called at one o'clock in the morning whereas taxi firms 
are. Likewise, people might call a ski resort in winter but not 
in summer. Hence the nature of the business can be used to 
weight the selection of certain lexemes for a particular 
enquiry. 

The use of several lexicons, each comprising a group of 
lexemes, allows the voice processing unit to restrict the field 
of search for each prompt, thereby improving speed and 
accuracy. Nevertheless, it would be possible to use a single 
lexicon rather than the several lexicons described above. 

It should be appreciated that although (NPA)NXX num- 
bers have been mentioned, the invention is not limited to 
North American calls but with suitable modification could 
handle international calls. 

It is envisaged that the voice processing unit 14A might 
dispense with computing a probability index for all desti- 
nation localities served by the directory assistance appara- 
tus. Instead, the locality lexemes could be grouped into 
predetermined subsets according to call identifiers, and the 
acoustic determination using speech recognition could sim- 
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ply be limited to such a subset. In such a modified system, 
the voice processing unit 14A would comprise, in addition 
to the lexicon of localities, data identifying, for each call 
identifier, a subset of localities to which the speech recog- 
nition process would be lirnited. Process step 305 (FIG. 3A) 5 
would be replaced with the step "SELECT A SUBSET OF 
POTENTIALLY-REQUESTED LOCALITY NAMES 
BASED UPON CALLING NUMBER". Each subset would 
be preselected as those localities which gave greatest rec- 
ognition accuracy, perhaps based upon empirical data 10 
acquired during actual service. 

Although an embodiment of the invention has been 
described and illustrated in detail, it is to be clearly under- 
stood that the same is by way of illustration and example 
only and is not to be taken by way of the limitation, the spirit 15 
and scope of the present invention being limited only by the 
appended claims. 

We claim: One such input is trunk 31 which connects a 
calling terminal 40 to the network 12 by way of intercon- 
necting network 30. Another calling terminal 46 is shown 20 
connected in like manner by interconnecting network 32, A 
third calling terminal is connected to the network 12 by 
customer line 44 

1. Directory assistance apparatus for a telephone system 
comprising: a voice processing unit having at least one 25 
lexicon of lexemes potentially recognizable by the unit and 
data predetermined for each of said lexemes, means for 
issuing messages to a caller making a directory assistance 
call to prompt the caller to utter one of said lexemes; means 
for detecting an identifier for the call source from whence a 30 
directory assistance call was received; means responsive to 
the identifier detected, and to said data, for computing a 
probability index for each lexeme representing the likeli- 
hood of that particular one of said lexemes being that uttered 
by the caller; and speech recognition means for recognizing, 35 
on the basis of the acoustics of the caller's utterance and the 
probability indexes, a lexeme corresponding to that uttered 
by the caller. 

2. Apparatus as claimed in claim 1, wherein the detecting 
means serves to detect as said identifier comprising at least 40 
a portion of a calling number from whence the directory 
assistance call was made. 

3. Apparatus as claimed in claim 1, further comprising 
means transmitting to the caller a message giving a directory 
number determined using the recognized lexeme. 45 

4. Apparatus as claimed in claim 1, wherein the lexemes 
comprise names of localities within said predetermined area; 
the data comprise the size of each locality and the distance 
between each pair of localities; and the means for computing 
the probability index computes for each locality, the likeli- 50 
hood of the caller requesting that locality based upon the 
distance between that locality and the caller's locality and 
upon the size of that locality for which the probability index 

is being computed. 

5. Apparatus as claimed in claim 4, wherein the size of 55 
each locality is determined by the number of local active 
directory numbers in the locality. 

6. Apparatus as claimed in claim 1, wherein the voice 
processing unit has one or more additional lexicons, each 
lexicon comprising a group of lexemes having a common 60 
characteristic, and the speech recognition means accesses 
the lexicons selectively in dependence upon one or more 
messages previously issued to the caller. 

7. Apparatus as claimed in claim 1, wherein the voice 
processing unit has one or more additional lexicons, each 65 
lexicon comprising a group of lexemes having a common 
characteristic, the computing means computes said index for 
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the lexicons selectively depending upon one or more mes- 
sages previously played to the caller and the speech recog- 
nition means accesses the lexicons selectively in dependence 
upon one or more messages previously issued to the caller. 

8. Apparatus as claimed in claim 1, wherein the lexemes 
comprise names businesses and the data comprise the nature 
of the businesses. 

9. Directory assistance apparatus for a telephone system, 
including a voice processing unit having a lexicon of lex- 
emes potentially recognizable by the unit, said lexemes 
including lexemes corresponding to localities in a predeter- 
mined area served by the directory assistance apparatus and 
lexemes corresponding to localities not in the predetermined 
area, the unit including: 

means for issuing to a directory assistance caller a mes- 
sage inviting the caller to utter the name of a locality; 

means for recognizing one of said lexemes from the 
utterance; 

means for determining whether or not the recognized 
lexeme is one of said lexemes corresponding to locali- 
ties not in the predetermined area served by the direc- 
tory assistance apparatus; and 

means for playing a message to the caller inviting the 
caller to direct the directory assistance request to a 
directory assistance apparatus for an alternative area 
including the locality corresponding to the recognized 
lexeme in the event that the recognized lexeme is not in 
the predetermined area. 

10. A method of at least partially automating directory 
assistance in a telephone system in which directory assis- 
tance apparatus comprises a voice processing unit having a 
lexicon of lexemes potentially recognizable by the unit and 
data predetermined for each lexeme, the method comprising 
the steps of: 

issuing messages to a caller making a directory assistance 
call to prompt the caller to utter one of said lexemes; 

detecting an identifier for a call source from whence the 
directory assistance call was received; 

computing, in response to the identifier and said data, a 
probability index for each lexeme representing the 
likelihood that the lexeme will be that uttered by the 
caller, and employing speech recognition means to 
recognize, on the basis of the acoustics of the caller's 
utterance and the probability index, a lexeme corre- 
sponding to that uttered by the caller. 

11. A method as claimed in claim 10, wherein the iden- 
tifier comprises at least a portion of a calling number of the 
call source. 

12. A method as claimed in claim 10, further comprising 
the step of transmitting a message to the caller giving a 
directory number determined using the recognized lexeme. 

13. A method as claimed in claim 10, wherein the lexemes 
comprise names of localities within said area; the data 
comprise the size of each locality and the distance between 
each pair of localities; and the computing the probability 
index computes for each locality, the likelihood of the caller 
requesting that locality based upon the distance between that 
locality and the caller's locality and upon the size of [the 
caller' s] that locality for which the probability index is being 
computed. 

14. A method as claimed in claim 13, wherein the size of 
a locality is detennined by the number of active directory 
numbers in the locality. 

15. A method as claimed in claim 10, wherein the voice 
processing unit has one or more additional lexicons, each 
lexicon comprising a group of lexemes having a common 
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characteristic and the speech recognition means is employed 
to access the plurality of lexicons selectively in dependence 
upon one or more messages previously issued to die caller. 

16. A method as claimed in claim 10, wherein the voice 
processing unit has one or more additional lexicons, each 5 
lexicons comprising a group of lexemes having a common 
characteristic, the computing means computes said index for 
lexemes in the different lexicons selectively, in dependence 
upon one or more messages previously issued to the caller 
and the speech recognition means is employed to access the 10 
plurality of lexicons selectively in dependence upon one or 
more messages previously issued to the caller. 

17. A method as claimed in claim 10, wherein the lexemes 
comprise names of businesses and the data comprise the 
nature of the business. 15 

18. A method of at least partially automating directory 
assistance in a telephone system having directory assistance 
apparatus serving a predetermined area and comprising a 
voice processing unit having a lexicon of lexemes poten- 
tially recognizable by the unit, said lexemes including 20 
lexemes corresponding to localities in a predetermined area 
served by the directory assistance apparatus and lexemes 
corresponding to localities not in the predetermined area, the 
method including the steps of: 

using the voice processing unit to issue to a directory 25 
assistance caller a message inviting the caller to utter a 
name of a locality; 

recognizing: one of said lexemes in the utterance; 

determining whether or not the recognized lexeme is one 30 
of said lexemes corresponding to localities not in said 
predetennined area served by the apparatus; and 

playing a message to the caller inviting the caller to direct 
the directory assistance request to a different directory 
assistance area in the event the recognized lexeme is 
not in the predetermined area. 
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19. Directory assistance apparatus, for a telephone sys- 
tem, comprising: a voice processing unit having at least one 
lexicon of lexemes potentially recognizable by the unit and 
data grouping the lexemes into predetermined subsets, each 
subset comprising lexemes preselected to give greater rec- 
ognition accuracy for calls from a particular source; means 
for issuing messages to a caller making a directory assis- 
tance call to prompt the caller to utter one of said lexemes; 
means for detecting an identifier for the call source from 
whence the directory assistance call was received; means 
responsive to the detected identifier for selecting one of said 
predetermined subsets; and speech recognition means lim- 
ited to the selected subset for recognizing, on the basis of the 
acoustics of the caller's utterance, a lexeme from said subset 
corresponding to that uttered by the caller. 

20. A method of at least partially automating directory 
assistance in a telephone system in which directory assis- 
tance apparatus comprises a voice processing unit having a 
lexicon of lexemes potentially recognizable by the unit, and 
data grouping the lexemes into predetermined subsets, each 
subset preselected as giving greater recognition accuracy for 
calls from a particular source, the method comprising the 
steps of: 

issuing messages to a caller making a directory as a 
distance call to prompt the caller to utter one or more 
utterances; 

detecting an identifier for a call source from whence the 
directory assistance call was received; 

selecting on the basis of the identifier one of said prede- 
termined subsets; and 

employing speech recognition means to recognize, from 
the selected subset and on the basis of the acoustics of 
the caller's utterance, a lexeme corresponding to that 
uttered by the caller. 

* * * * # 
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(57) Easily accessed and widely available lan- 
guage interpretation services are provided in a 
public switched telephone network by a com- 
mon platform adjunct which automatically con- 
nects an interpretation services subscriber with 
a selected language interpreter associated with 
a language interpretation platform in the net- 
work. A subscriber dials, for example, an inter- 
national telephone number which includes a 
code indicating that the call is an international 
call, a country code, a city code, and a local 
destination number. The ANI of the subscriber 
is detected and the call is routed to the adjunct 
which further verifies and validates the sub- 
scriber. The adjunct places a call through the 
public switched telephone network to the lan- 
guage interpretation platform. The call is 
answered either by an automatically preselec- 
ted interpreter or by a human operator who 
causes the call to be manually transferred to a 
desired interpreter. The international call is 
completed to the destination and the calling 
subscriber, the interpreter, and the called party 
are bridged together. 
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Technical Field 

This invention relates to telecommunications be- 
tween parties who speak different languages. More 
specifically, this invention relates to language inter- 
pretation services provided in a telecommunications 
system. 

Background of the Invention 



Detailed Description 

FIG. 1 shows an example of a public switched tel- 
ephone network architecture which may be used to 
implement various examples of this invention. FIG. 1 
includes a schematic diagram of representative por- 
tions of a typical telecommunications network which 
Includes a long distance public switched telephone 
network provided by a long distance carrier such as 
AT&T. As shown in FIG. 1 , entry into the long distance 
network is from a local public switched telephone net- 
work provided by a local exchange carrier (LEC) such 
as one of the Regional Bell Operating Companies. In 
this example, the long distance public switched tele- 
phone network implements a language interpretation 
service in accordance with the invention, as descri- 
bed in more detail below. The language interpretation 



Today s telecommunication systems make it rou- 
tine for persons from different countries to communi- 
cate with each other on a regular basis. Very often, 
however, the parties to such a call do not speak the 
same language. The usefulness of today's telecom- 
munications systems coufd be greatly improved if 
there were some way to translate communications 
from one language to another in a telephone network. 

There have been proposals to develop comput- 
ers in telecommunication systems which can auto- 
matically translate voice communications from one 
language into another language. These efforts are 
currently in a somewhat rudimentary stage and are 
far from becoming a commercially practical reality. 

In the meantime, AT&T offers a language inter- 
pretation service which is currently a part of the AT&T 
switched network. Known as the AT&T Language 
Line® Service; it allows a caller to contact a human 
interpreter for assistance in making a telephone call 
expected to involve parties speaking different lan- 
guages. The caller dials an 800 number to reach the 
service after which the caller gives his or her credit 
card or AT&T calling card number to an operator. An 
operator takes additional information from the caller 
about the call including the phone number of the 
called party and the languages expected to be spok- 
en. The operator then connects the caller to a human 
interpreter fluent in the languages to be spoken dur- 
ing the phone call. The caller or operator completes 
the call to the called party resulting in a conference 
call between the caller, the called party, and the in- 
terpreter. 



tiate and set up a phone call using the interpretation 
service is too great for convenient use of the service. 
Second, the requirement that only callers having 
credit cards or phone cards may use the service un- 
5 duly limits the number of potential customers for the 
service. This invention provides a language interpre- 
tation service which is easier to use than existing lan- 
guage interpretation services. In particular, language 
interpretation is automatically made available for tel- 
10 ephone calls initiated by conventional direct dialing 
procedures or by substantially similar procedures 
normally used to make a telephone call to a desired 
destination. No extra telephone numbers need to be 
used and billing for the services may be accomplish- 
es ed without a need for a credit card or a phone card. 

In one example of the invention, a presubscribed 
caller is automatically given access to an interpreta- 
tion service whenever a standard telephone call is 
made from a directory number associated with the 
20 subscriber stored in the network. No special 800 
number must be dialed and the caller need not have 
a credit card or phone calling card. In another exam- 
ple of the invention, a caller may be given automatic 
access to an interpretation service by dialing a spe- 
25 cial prefix or suffix along with the direct dial telephone 
number of a called party. 

There are just two examples of the invention, the 
full scope of which is defined in the claims appended 
to this application. Other examples of the invention 
30 will be apparent from the claims and the following de- 
tailed description of the preferred embodiments. 

Brief Description of the Drawing 

35 FIG. 1 is a block diagram of a public switched tel- 

ephone network providing a language interpretation 
service in accordance with this invention. 

FIGs. 2A to 2D depict a flow chart representing 
an illustrative call flow for a language interpretation 

40 service provided by the network of FIG. 1 . 



Summary of the Invention 

Although AT&Ts Language Line® Service is a 
distinct and commercially significant improvement in 
the way telecommunications services are provided in 
today's networks, we have discovered that interpre- 
tation services provided by a telecommunications 
network can be markedly improved if the interpreta- 
tion service were easier to access and if more people 
were able to partake of the service. 

In this regard, there are two aspects of existing 
language interpretation services in telecommunica- 
tions systems which can be improved. First, the 
amount and complexity of the information which must 
be entered into the telecommunications system to ini- 
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service could also be provided in one or more of the 
local public switched telephone networks such as 
those by which telephone customers typically gain 
access to long distance telephone networks. 

The network shown in FIG. 1 includes a public 5 
switched telephone network 10 which provides local 
telephone service to a number of telephone custom- 
ers. One of those customers is a language interpre- 
tation service subscriber 1 2 shown in FIG. 1 . The sub- 
scriber 12 is connected to the network 10 by a suit- 10 
able subscriber line 14 which may provide a voice cir- 
cuit and suitable signaling capability, such as dual 
tone multiple frequency (DTMF) signaling capability. 
The network 10 is connected to an originating toll 
switch 1 6 in the long distance network by means of a 15 
suitable trunk connection 18. 

A line 20, which may be configured to operate as 
a primary rate interface (PRI), connects the switch 16 
to a teleconference bridge 22 which will be used to 
create a conference between the subscriber 1 2, an in- 20 
terpreter, and a called party. The bridge 22 is connect- 
ed to a common platform adjunct 24 by means of a 
suitable line 26. The adjunct 24 is a computer which 
effectuates the routing of calls through the network 
to connect the subscriber 12 with a called party and 25 
a language interpretation system in the network. The 
adjunct 24 may contain a subscriber validation data- 
base 25 containing profiles of those who subscribe to 
the interpretation service. The adjunct 24 also may 
contain a tones and announcements database 27 30 
containing announcements and tones by which the 
adjunct may send appropriate messages to the sub- 
scriber 12. The adjunct 24 may also contain a proc- 
essor 29 which may have a voice recognition circuit 
which receives and act upon voice responses from 35 
the subscriber. The processor 29 may also have de- 
tection circuitry which receives and acts upon signal- 
ing received from the subscriber 12. The processor 
29 routes telephone calls through various parts of the 
network, and also selectively retrieves information 40 
from the database 25, selectively presents the an- 
nouncements and tones from the database 27, and 
otherwise coordinates the activities of the adjunct 24 
with the activities of the rest of the public switched 
network via the bridge 22. Although the adjunct 24 45 
and bridge 22 are shown in FIG. 1 to be connected to 
the originating switch 16, those elements may be 
connected to any convenient switch in the network. In 
addition, although the bridge 22 is shown as a com- 
ponent separate from the adjunct 24 and the switch 50 
1 6, it may also be i mplemented as a hardware or soft- 
ware structure in either the adjunct 24 or the switch 
16. 

The toll switch 16 is trunked to a toll switch 28 
which acts as a terminating switch for a language in- 55 
terpretation platform 30. The switch 28 is connected 
to the switch 16 by means of a trunk connection 32. 
Although a single direct trunk connection 32 is shown 
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in FIG. 1, there may be additional switches and other 
network circuit elements between the switch 16 and 
the switch 28 depending on the locations of the sub- 
scriber 12, the adjunct 24, the platform 30, and the 
called party. 

The toll switch 28 is connected to t he language in- 
terpretation platform 30 by means of a line 34 which 
may be a primary rate interface. A PBX 36 in the plat- 
form 30 is connected to line 34 and serves to route 
the calls to an appropriate language interpreter LI. 
The PBX may be connected to any number of lan- 
guage interpreters only one of which is shown in FIG. 
1. The PBX 36, in some examples of the invention, 
may be connected to one or more live operators LO 
who may facilitate the connection of the subscriber 
12 to an appropriate language interpreter LI. The in- 
terpreters LI could reside at the PBX 36 as shown in 
FIG. 1 or they could reside at any other PBX or off a 
local switch in the network. 

The toll switch 28 is connected to a toll switch 38 
which acts as an international gateway between the 
domestic long distance network shown in FIG. 1 and 
one or more telephone networks operated by foreign 
carriers, one of those networks being shown in FIG. 
1 and given reference numeral 40. The toll switch 28 
is connected to the international gateway toll switch 
38 by means of a suitable trunk connection 42 which 
may or may not have additional switches and network 
elements between the toll switch 28 and the toll 
switch 38 depending on the relative locations of the 
switch 28, the platform 30 and the toll switch 38. The 
switch 38 is connected with the foreign network 40 
via suitable transmission equipment 44. 

The language interpretation features of the tele- 
phone network shown in FIG. 1 allow subscribers to 
receive language interpretation support tor all direct 
dialed international voice calls, for example. Sub- 
scribers may place international voice calls using cur- 
rent international plain old telephone service (POTS) 
dialing procedures. In one example of the invention, 
users will presubscribe to the interpretation service. 
In other examples of the invention, end users will not 
have to presubscribe. In some examples of the inven- 
tion, subscribers may initiate calls involving language 
interpretation from their own directory number or 
from other predefined telephone numbers such as 
those associated with certain public phones in air- 
ports or other transportation facilities. In addition, in- 
terpretation services may be obtained not only for in- 
ternational telephone calls, but also for domestic tel- 
ephone calls known to involve parties speaking dif- 
ferent languages, including toll-free domestic tele- 
phone calls. 

In one detailed example of the invention, shown 
in FIGs. 2A-2B, subscribers to language interpreta- 
tion services will first dial a designation that the call 
wifl bean international call. For example, subscribers 
will first dial 011. Next, the subscriber will dial a coun- 
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try code indicating the country to which the phone 
call is to be directed and a national number represent- 
ing the destination number of the called party. The 
numbers dialed by the subscriber are passed through 
the local telephone network 10 and are received by 5 
the toll switch 16 in block 46. The toil switch 16 also 
receives and screens the subscriber's automated 
number identification (AN I) in block 48. In block 50, 
the switch 16checkstosee ifthiscall is coming from 
a directory number which is stored in the switch 16 as 10 
having subscribed to the interpretation service. In 
this regard, the switch 16 is programmed by a provi- 
sioning system 31 with a list of telephone numbers 
representing those which are associated with sub- 
scribers to the interpretation service. If the switch 16 15 
determines that the call is being made from a tele- 
phone number which is subscribed to the interpreta- 
tion service, and the caller is thereby identified as an 
interpretation service subscriber, the call is routed by 
the switch 1 6 in block 52 to the adjunct 24. If the ANI 20 
is not identified as belonging to a subscriber, then the 
call is routed normally, as shown in block 54. The ad- 
junct 24 will further validate the ANI in block 56 by 
checking suitable information stored in database 25. 
For example, the adjunct 24 may check a list of sub- 25 
scribers who have past due and unpaid bills. In some 
examples of the invention, entry of a password 
known only to the subscriber may be required. The 
adjunct 24 will check to see if the entered password 
is a password associated wit h a subscriber. If the sub- 30 
scriber is not validated, service is denied or the call 
is not completed, or both, as shown in block 58. If the 
subscriber is validated, the adjunct 24 may prompt 
the user in block 60, for example, by playing an an- 
nouncement, for further information on call handling. 35 
The prompt of the subscriber can be either a tone 
(e.g., a "bong") or an announcement An example of 
a suitable announcement would be an indication that 
the caller has reached an on-demand interpretation 
service. The announcement may state that the caller 40 
should enter a predetermined shorthand activation 
code indicating that the caller wishes to use the inter- 
pretation service. For example, the announcement 
may request that the caller press a certain sequence 
of buttons on a Touch Tone telephone such as the * 45 
key followed by one of the numerical keys. A plurality 
of different interpretation services may be offered 
and the subscriber may indicate a selection of one of 
those services by the code which he or she enters. 
The announcement may also indicate to the subscrib- so 
er that he or she may forego the interpretation service 
by entering a service refusal code, for example, by 
pressing the # symbols on a Touch Tone telephone. 
Following the tone or the announcement, a timer is 
set by the adjunct 24. If the timer expires, for exam- 55 
pie, after a period of five seconds from the tone or an- 
nouncement, the call will be handed over from the ad- 
junct 24 to the toll switch 1 6 to be routed like a normal 



POTS call as shown in block 54 and the line between 
blocks 54 and 60 in FIG. 2A. If the subscriber signifies 
that he or she does not wish to use the interpretation 
service by entry of a service refusal code, the adjunct 
24 will immediately hand the call over to the switch 
16 for normal processing in block 54. 

If the subscriber requests language interpreta- 
tion by entry of the service activation code, the ad- 
junct 24 will initiate a call request in block 62 to the 
PBX 36 in the language interpretation platform 30. To 
accomplish this, the adjunct 24 is preprovisbned with 
the destination number of the platform 30. The ad- 
junct 24 sends a setup message to the PBX 36 includ- 
ing the directory number of the subscriber. The mes- 
sage may include the country code dialed by the sub- 
scriber and the national number of the called party. 
The phone call may be connected with a preselected 
language interpreter, as shown in block 64, in a num- 
ber of different ways. The phone call from the adjunct 
24 may be directed first to a live operator LO who may 
assist the caller in reaching a desired language inter- 
preter. The country code alone or the combination of 
country code and city code may be used by the live 
operator LO to help determine which language inter- 
preter should be used. In this regard, there may be 
circuitry in PBX 36 which produces a list of languages 
displayed to the operator and likely to be spoken in 
the geographical area represented by the country 
code and city code. Alternatively, the call from the ad- 
junct 24 may be automatically directed to an appropri- 
ate language interpreter LI without the intervention of 
a human operator LO. In this case, the language in- 
terpretation platform may include circuitry which de- 
tects the nature of the country code, city code, or 
both the country and city codes, and automatically 
brings an appropriate language interpreter LI to the 
phone call based on the predominant languages 
spoken in the geographical area represented by the 
country code and city code. 

The call to the PBX 36 is routed via the bridge 22, 
the toll switch 26, and the toll switch 28. Upon receipt 
of a connect message, the human operator or the au- 
tomatically ascertained language interpreter are 
bridged onto the call. If a human operator is connect- 
ed, he or she will assist the subscriber in finding the 
desired language interpreter and, if needed, will col- 
lect any other needed information from the subscrib- 
er. Once the subscriber is validated and the language 
interpreter becomes available, the call is then trans- 
ferred to the selected interpreter who is now bridged 
onto the call. At this point, conversation can take 
place between the subscriber and the interpreter dur- 
ing which the role played by the interpreter in the up- 
coming phone conversation with the called party 
may be determined. At the same time, billing for the 
services of the interpretation system may begin at the 
PBX 36. The billing for the interpretation services will 
be added to the billing for the actual telephone call. 
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Since this service part of the call is billed by the PBX 
36, the adjunct 24 must insure that the subscriber's 
destination number is delivered to the PBX 36 so that 
the bill for the interpretation service can appear on 
the regular bills relating to the caller's telephone. AJ- 5 
ternatively, the switches or the adjunct 24 could pro- 
duce a billing record. 

When the two-way conversation between the 
subscriber and the interpreter is concluded and its 
agreed to complete the actual phone call to the called 10 
party, the subscriber or the language interpreter may 
trigger the adjunct 24 with a suitable signal which wi II 
cause the adjunct 24 to complete the international leg 
of the call setup, as shown in block 66 in FIG. 2B. The 
adjunct 24 wi II send, via the bridge 22, the subscriber 1 s 
dialed country code and national number to the toll 
switch 1 6 and toll switch 28 for further processing and 
routing to the international gateway switch 38 as an 
international POTS telephone call. At this point, bridg- 
ing between the subscriber, the interpreter, and the 20 
switch 1 6 is considered to be act ivated, but waiting for 
cut-through to the international switch 38. When the 
country code and national number are received by 
the switch 16 from the adjunct 24, the rest of the 
switches between the switch 1 6 and the switch 38 wi II 25 
route call to the switch 38 and thereafter to the foreign 
carrier network 40. The set-up of the call is consid- 
ered complete when a cut-through message is re- 
ceived. The ultimate result is that there is a confer- 
ence call created by bridging the subscriber, the inter- 30 
preter, and the called party, as shown in block 68. 
Once the conference call has been created in block 
68, the adjunct 24 may receive a request from one of 
the conference participants to drop the interpreter 
from the conference in block 69 in FIG. 2C. The ad- 35 
junct 24 then drops the part of the conference circuit 
connecting the language interpreter to the confer- 
ence in block 70. the adjunct then hands the call over 
to the switch in block 71 and waits for a new call re- 
quest in block 72. At the end of the phone call, the ad- 40 
junct receives a request to drop all circuits involved in 
that particular call as shown in block 73 in FIG. 2D. 
The adjunct terminates all the circuit connections in- 
volved in the call in block 74 and waits for a new call 
request in block 75. Call disconnect will occur on re- 45 
ceiptof an on-hook signal from a subscriber. This on- 
hook signal will trigger the adjunct 24 to release the 
connection to the PBX 36 and toll switches 16, 28 and 
38. The interpreter is able to drop off from the call at 
any time without terminating the calf. In addition, the 50 
call could be handed over to an appropriate switch in 
the network at any moment if one of the end users 
determine that there is no need for language interpre- 
tation. This handover to a switch can be initiated by 
the originator of the calf, for example, by dialing the 55 
* key on a Touch Tone telephone. 

The interpretation services provided by this in- 
vention may be obtained for any direct dialed interna- 



tion calls originating from residential or other sources 
in any country. In addition to having this service 
available from a subscriber's own telephone number, 
the service may also be made available to subscrib- 
ers from selected other locations, for example, from 
certain airport telephones and the like. The service 
may even be made available to certain non-subscrib- 
ers who use toll-free services such as the interna- 
tional inbound services, known as I-800 services. In 
these situations, the telephone numbers of these 
other phones will be screened in block 48 in the same 
manner that the telephone numbers of the subscrib- 
ers are screened in the example of the invention de- 
scribed above. In addition to a fist of acceptable tele- 
phone numbers from which interpretation services 
may be obtained, the adjunct 24 will also have a look- 
up table of subscriber codes associated with the nor- 
mal directory numbers of the subscribers and individ- 
ual passwords/I. D.'s. When a subscriber wishes to 
use the interpretation service from a telephone other 
than his or her normal telephone, the subscriber will 
be prompted in the course of performing the valida- 
tion in block 56 to enter his or her normal destination 
number followed up a multi-digit access code. Upon 
successful validation by the adjunct 24, the call proc- 
essing may proceed in the same manner as in the ex- 
ample of the invention described above. 

In a menu-driven example of this invention, the 
subscriber dials the destination number of the phone 
call in block 46 and is given the previously described 
tone or announcement in block 60 relating to entry 
into the interpretation service. The subscriberthen in- 
dicates a desire to enter the service as described 
above by entry of an activation code. In connection 
with performing the call connection of block 64, an 
automated voice system in the platform 30 answers 
and provides the subscriber with a menu of languag- 
es to choose from based on the country code and city 
code entered by the subscriber. For example, in a call 
to Strasbourg, France, the subscriber may be provid- 
ed with the option of choosing either a French inter- 
preter or a German interpreter. In a call to Switzer- 
land, the option of choosing a French, German, or 
Italian interpreter may be given. Selection is made by 
pressing certain dialing codes on the subscriber's tel- 
ephone. The subscriber may be given another code 
to select if he or she desires personal assistance from 
a human operator. Voice recognition circuitry may be 
provided in the language interpretation platform 30 
so that a subscriber may make a selection of inter- 
preter by voice command rather than by entry of 
codes into his or her telephone. 

In yet another example of the invention, the sub- 
scriber may be permitted to enter an additional code 
after entry of the service activation code to indicate 
a selection of an interpreter dealing in a desired lan- 
guage. In this situation, the subscriber may be given 
the opportunity of reaching a human operator by di- 
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aiing an additional code after entry of the activation 
code. 

There may also be stored in a subscriber's profile 
in the database 25 that this subscriber normally wish- 
es to use an interpreter fluent in a particular Ian- 5 
guage. The subscriber will then be automatically con- 
nected to such interpreter each time the subscriber 
uses the interpretation services. 

In an additional variation of the invention, a caller 
keys in an international telephone number and a spe- 10 
cial short hand designation, such as the designation 
produced by depressing the * key on a Touch Tone 
telephone. In one example, the special designation 
may be a prefix dialed before the international tele- 
phone number is dialed and in another example, the 15 
special designation may be a suffix dialed after the 
international number is dialed. The designation caus- 
es the caller to be automatically connected to an in- 
terpretation services in the network. The platform 
may detect the country code in the international tel- 20 
ephone number and route the call to an interpreter 
who speaks English and the predominant language 
spoken in the country represented by the entered 
country code. The telephone number from which the 
call was initiated will be billed for the interpretation 25 
service and the international phone call. The inter- 
preter may complete the phone call and remain bridge 
onto the call to provide language translation for the 
call as in other examples of the invention. 

An interpretation service in accordance with this 30 
invention could be configured to route calls to inter- 
preters residing in foreign networks. For example, 
calls could be routed to interpreters connected with 
PTTs or AT&Ts Global Network. Collect calls could 
be handled by provision of a common platform ad- 35 
junct having a database to validate the called party 
and play appropriate announcement to the called par- 
ty. 



Claims 



40 



A telecommunications apparatus, comprising: 

means for receiving a telephone call com- 
prising at least a telephone number of a caller and 45 
a telephone number of a called party; and 

means for detecting a predetermined 
characteristic of the caller's telephone number 
and automatically directing the telephone call to 
an interpretation service. 50 

A method of providing a language interpretation 
service in public switched telephone network, 
comprising the steps of: 

receiving a telephone call comprising at 55 
least a caller's telephone number and a called 
party's telephone number, and 

automatically providing an interpretation 



service in response to a predetermined charac- 
teristic of the caller's telephone number. 

3. The method of claim 2, in which the predeter- 
mined characteristic is whether or not the caller's 
telephone number is stored in the network as a 
telephone number associated with a subscriber 
to the interpretation service. 

4. The method of claim 3, in which providing step 
comprises the step of automatically routing the 
telephone call to a common platform adjunct in 
response to a determination that the caller's tel- 
ephone number is a telephone number associat- 
ed with a subscriber to the interpretation service. 

5. The method of claim 4, in which the providing 
step further comprises the step of validating the 
subscriber. 

6. The method of claim 4, in which the providing 
step further comprises the step of sending a 
message to the subscriber giving the subscriber 
an option to activate the interpretation service. 

7. The method of claim 6, in which the providing 
step further comprises the step of receiving a 
message from the subscriber indicating a desire 
to activate the interpretation service and auto- 
matically routing the telephone call to a language 
interpretation platform in the network. 

8. The method of claim 7, in which the providing 
step further comprises the step of receiving the 
telephone call in the language interpretation plat- 
form and connecting the telephone call to a lan- 
guage interpreter desired by the subscriber. 

9. The method of claim 8, in which the step of re- 
ceiving the telephone call in the platform com- 
prises the step of connecting the telephone call 
to a human operator and transferring the call 
from the human operator to a language interpret- 
er desired by the subscriber. 

10. The method of claim 8, in which the step of re- 
ceiving the telephone call in the platform com- 
prises the step of automatically connecting the 
telephone call to a desired language interpreter in 
response to a predetermined characteristic of the 
telephone number of the called party. 

11. The method of claim 10, in which the predeter- 
mined characteristic of the telephone number of 
the called party is a country code. 

12. The method of claim 2, in which the public switch- 
ed telephone network is a long distance tele- 
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phone network. 

13. The method of claim 2; in which the public switch- 
ed telephone network is a local telephone net- 
work. 5 

14. The method of claim 4, in which the providing 
step includes the step of routing the telephone 
call to a language interpreter desired by the call- 
er. 10 

15. The method of claim 14, in which the providing 
step further comprises the steps of: 

routing the telephone call to the called 
party; and 15 

bridging together the caller, the language 
interpreter, and the called party. 

16. The method of claim 2, in which the telephone 

call is an international telephone call. 20 



receiving a telephone call comprising a 
called party's telephone number and a short 
hand designation appended to the called party's 
telephone number representing a request for lan- 
guage interpretation; and 

automatically providing language interpre- 
tation in response to the called party's telephone 
number and the short hand designation. 

22. The method of claim 21 , in which the short hand 
designation is a suffix appended to the called 
party's telephone number. 

23. The method of claim 21, in which the short hand 
designation is a prefix appended to the called 
party's telephone number. 



17. The method of claim 2, in which the telephone 
call is an domestic telephone call. 

18. The method of claim 2, further comprising the 25 
step of: 

receiving an additional caller's telephone 
number, and 

in which the providing step includes the 
step of responding to the first mentioned caller's 30 
telephone number and the additional caller's tel- 
ephone number and the additional caller's tele- 
phone number to determine whether or not the 
first mentioned caller's telephone number is as- 
sociated with a predetermined telephone from 35 
which language interpretation service may be ob- 
tained and to determine whether or not the addi- 
tional caller's telephone number is associated 
with a subscriber to the language interpretation 
service. 40 



19. The method of claim 2, in which the providing 
step includes the step of sending a menu of 
choices to a caller relating to languages spoken 
by language interpreters available to assist the 
caller. 



45 



20. The method of claim 2, in which the providing 
step includes the step of automatically routing 

the telephone call to a language interpreter pre- 50 
viously selected by a subscriber for automatic 
connection to the subscriber each time the sub- 
scriber uses the language interpretation servic- 
es. 

55 

21. A method of providing language interpretation in 
a public switched telephone network, comprising 
the steps of: 
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(g) Editing compressed voice information. 

@ A method and apparatus for editing the dis- 
played voice wave form by marking the portion 
of interest on the screen is disclosed. Marked 
segment may then be deleted, for example, or 
copied into another segment in second voice 
editing window. In either case, pointers are 
established at the selected marker positions of 
the displayed voice segment and in the corre- 
sponding positions of uncompressed voice seg- 
ments. The voice data is treated as a stream of 
fixed-length micro-segments, where there is a 
predictable correlation between the positions of 
the compressed and uncompressed data. In the 
implementation at hand, these micro-segments 
are 20 ms. in length. Editing is accomplished by 
modifying micro-segments in both the com- 
pressed and uncompressed segments simul- 
taneously. When the user is satisfied with the 
result, the edited wave form is redrawn on the 
screen. The user may then SAVE the result, and 
the entire segment is rewritten to the data base, 
replacing the previous version. Only the com- 
pressed version is written, thus eliminating the 
need for a subsequent pass through the com- 
pression hardware with the associated com- 
pounding of distortion. 
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This invention generally relates to improvements in voice messaging and more particularly to editing voice 
information without affecting the fidelity of the original voice information. 

Information processing systems having a voice generating capability are presently employed as answering 
machines, voice messaging systems, voice response units and in general as intelligent peripherals. The voice 
5 signal may be prerecorded on audio tape or may be digitized, compressed and stored, for example, on a mag- 
netic disk. 

A typical application couples the information processing system to one or more phone lines, the system 
detecting the occurrence of a ring signal and answering the phone. Often a standard prompt voice message 
is sent to the phone line. Depending on the type of system the caller may depress certain buttons on a Touch- 
10 Tone phone set in order to inform the system of a specific type of action desired by the user. For example, after 
hearing the message, the information processing system may have access to a large data base, such as a data 
base containing stock quotations. The caller may signal the system to access one or more quotations from the 
data base whereafter the system converts the quotation to an audio voice signal which is output to the caller's 
phone line. 

15 As can be appreciated, for such systems the interaction between a caller and the system may become quite 
complex. As a relatively simple example, if the caller desires to learn if any voice messages are stored for the 
caller the system may respond with a voice signal such as "you have three new voice messages". In generating 
this response the number "three" is a variable which is determinable at the time that the caller is connected to 
the system. 

20 Furthermore, the word "messages" is also a variable in that if only one voice message is pending the sing- 

ular form "message" should be returned and not the plural form. It can thus be appreciated that the ability to 
accurately define a series of system responses to an incoming call is an important aspect of such a voice res- 
ponse system. 

Also, it is preferable that a voice applications writer be able to create and modify the system responses in 
25 a relatively uncomplicated and time efficient manner. That is, the operator of the system should be able to 
interact with the voice response system to create and modify voice responses in a manner which does not 
require the direct assistance of the provider of the system or the direct assistance of skilled programming per- 
sonnel. 

Many business applications can be automated with voice processing technology. A business can use voice 

30 processing equipment to call its clients and deliver or solicit information. Alternatively, business customers can 
call into a firm's voice processing unit to obtain information, place orders, or transfer to human service agents 
or other response equipment. Other applications can employ voice processing equipment to exchange infor- 
mation with other call handling equipment without human intervention. 

In most cases, the call originating or call transferring automated equipment must be able to communicate 

35 information to a user on the basis of dynamic information entered by a user. An example of a prior art call pro- 
cessing system that can benefit from incorporating the subject invention is US Patent 4,627,001, which dis- 
closes a voice editing data system using a display system for editing recorded speech. US Patent 4,779,209 
discloses another system for editing voice data. US Patent 4,853,952 discloses yet one more text editing system 
for editing recorded voice signals. US Patent 4,766,604 discloses yet another voice message handling system 

40 which includes voice prompts. Finally, US Patent 4,920,558 describes a speech file downloading system whe- 
rein static voice prompts are recorded. 

In a voice-based computer application system it is often necessary to edit recorded voice prompts in order 
to produce natural-sounding results. However, this voice data is typically stored in compressed form in order 
to minimize data traffic and storage consumption within the system. Moreover, repetitive editing of the same 

45 voice information introduces increasing distortion into the edited result owing to the approximations in the com- 
pression and decompression techniques involved. These distortions may be particularly severe in high-rate 
compression. 

The present invention seeks to address these problems and accordingly provides, in one aspect, a method 
for editing compressed voice information, comprising the steps of: selecting a segment of compressed voice 

50 information; decompressing the compressed voice segment and displaying the decompressed voice infor- 
mation on a display; marking a portion of the displayed voice information; calculating the location and extent 
of a portion of the compressed voice segment corresponding to said marked portion; editing the marked portion 
of the decompressed voice information; and correlating the editing actions from the decompressed voice infor- 
mation to the compressed voice information. 

55 In a second aspect of the invention, there is provided an apparatus for editing compressed voice infor- 
mation, comprising: means for selecting a segment of compressed voice information; means for decompressing 
the compressed voice segment and displaying the decompressed voice information on a display; means for 
marking a portion of the displayed voice information; means for calculating the location and extent of a portion 
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of the compressed voice segment corresponding to said marked portion; means for editing the marked portion 
of the decompressed voice information; and means for correlating the editing actions from the decompressed 
voice information to the compressed voice information. 

Using the Voice Application Generator tool of the preferred embodiment of the invention, the user (i.e., the 
person performing the voice editing) invokes a voice generation screen. She selects a voice segment of interest, 
and the "modify" option. A window is opened and the analog wave form of the selected voice segment is pre- 
sented. This wave form is created by causing the selected compressed voice segment to be looped through 
voice decompression hardware and returned in clear channel (i.e„ uncompressed) form. Both the compressed 
and decompressed voice forms are retained in memory. 

The user proceeds to edit the displayed voice wave form by marking the portion of interest on the screen. 
Tools are available (e.g., ZOOM) to improve the user's ability to identify the exact position to be marked. The 
marked segment may then be deleted, for example, or copied into another segment in second vofce editing 
window. In either case, pointers are established at the selected marker positions of the displayed voice segment 
and in the corresponding positions of uncompressed voice segments. 

The voice data is treated as a stream of fixed-length micro-segments, where there is a predictable corre- 
lation between the positions of the compressed and uncompressed data In the implementation at hand, these 
micro-segments are 20 ms. in length. Editing is accomplished by modifying micro-segments in both the com- 
pressed and uncompressed segments simultaneously. 

When the user is satisfied with the result, the edited wave form is redrawn on the screen. The user may 
then SAVE, the result, and the entire segment is rewritten to the data base, replacing the previous version. 
Only the compressed version is written, thus eliminating the need for a subsequent pass through the compres- 
sion hardware with the associated compounding of distortion. 

A preferred embodiment of the invention will now be described, by way of example only, with reference to 
the accompanying drawings in which: 

Figure 1 is a system diagram of the voice application enabler apparatus in accordance with the subject 

invention; 

Figure 2 is a system diagram of the internal components of the voice application generator in accordance 
with the subject invention; 

Figure 3 is a system diagram of the internal components of the voice application enabler in accordance 
with the subject invention; 

Figure 4 is a block diagram of a state table in accordance with the subject invention; 

Figure 5 is a block diagram of the data base files in accordance with the subject invention; 

Figure 6 is a system diagram of the general purpose server development process in accordance with the 

subject invention; 

Figure 7 is an illustration of a prompt segment editor display in accordance with the subject invention; 
Figure 8 is an illustration of a prompt editor panel in accordance with the subject invention; 
Figure 9 is an illustration of a state editor panel in accordance with the subject invention; 
Figure 10 is a block diagram of a phone service application in accordance with the subject invention; 
Figure 11 is an internal state table in accordance with the subject invention; 

Figure 12 is a block diagram illustration of a how a complex variable is played in accordance with the subject 
invention; 

Figure 13 is a block diagram of the national language support parameter numbers in accordance with the 
subject invention; 

Figure 14 is a block diagram of the state table managers in accordance with the subject invention; 
Figure 15 is a flowchart of a play variable in accordance with the subject invention; 
Figure 16 is a flowchart of editing voice segments in accordance with the subject invention; 
Figure 17 is a flowchart of national language support setup in accordance with the subject invention; 
Figure 18 is a flowchart of national language support application development in accordance with the sub- 
ject invention; 

Figure 19 is a flowchart of national language support execution in accordance with the subject invention; 
Figure 20 is a flowchart of preparation for playing complex variables in accordance with the subject inven- 
tion; and 

Figure 21 is a flowchart of a playing complex variables in accordance with the subject invention. 

Referring to Figure 1, the apparatus for performing a voice related application includes an RISC Sys- 
tem/6000 (trademark of IBM Corporation) 1 0 with on board RAM for executing a variety of appl ications including 
AlXwindows (AlXwindows is a trademark of IBM Corporation) 60, Voice Applications Enabler (VAE) 70, and a 
data base 50 for storing and transferring static and dynamic voice information to the memory of the RISC Sys- 
tem/6000. The data base 50 also contains means for storing rules for vocalizing the voice information. The rules 
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are interpreted by the Voice Applications Generator (VAG) 80 and used to vocalize the dynamic and static voice 
messages into the proper user prompt. A terminal 20 is attached to the RISC System/6000. 

Using standard IBM hardware and operating systems, the Voice Application Enabler (VAE) connects cal- 
lers with application servers and manages the sessions, with the telephone as the interactive medium. Principal 
elements of the system are the Application Server Interface (ASI), Voice Application Generator (VAG), and the 
Host which is a host system providing data base access and storage. 

The VAE provides facilities and connectivity for customer implementation of voice-data applications. It uses 
telephone lines to connect a customer-premises or central office switch to provide dual-tone multi-frequency 
(DTMF) signalling recognition for state-of-the-art voice compression and decompression. Customers can 
develop application scripts and voice prompts with an easy-to-use, high level application-specific language or 
graphic interface and other development tools. 

Several publications describe the operating system tools used to implement the invention. IBM Advanced 
Interactive Executive for Personal System/2 (AIX PS/2) IBM AIX (AIX is a trademark of IBM Corporation) 
Operating System Commands, IBM Publication Number - SC23-2025-0. IBM AIX Operating System; Technical 
Reference -SC23-2032-0 IBM AIX Operating System; Tools and Interfaces - SC23-2029-0. 

The Voice Application Enabler (VAE), Prototype, is designed to support interactive transaction processing 
and to integrate voice and data applications. The major characteristics of the VAE are: 

O Telephone switch interfaces; 

O Advanced voice compression; 

O Telephone used as interactive medium; and 

O Application level customer programmability. 

The voice-data application addresses two identifiable environments: 

O Central office (CO) telephone company enhanced service provider; and 

O Customer premises computer assisted telephony, using a PBX or CO switch. 

The VAE provides facilities and connectivity for customer implementation of voice-data appl ications. It uses 
telephone lines to connect to a switch, and provides dual-tone multi-frequency (DTMF), switch signalling rec- 
ognition, and voice compression and decompression. In addition, it supports rotary dial telephones when voice 
recognition is operational. Users can develop application scripts and voice prompts with an easy-to-use, high- 
level, application-specific language or graphics interface, and other development tools. Users can also conduct 
application sessions with a Host computer using the telephone. Examples of such applications include voice 
messaging, voice infoimation services, and data base applications. 

In the CO environment generality of function, scalability, reliability and support of multiple switches are 
critical requirements, while in the customer premises environment, entry cost broad functionality and multi- 
switch compatibility are key elements. Both environments share common technological dependencies: multi- 
channel operations, real-time response, state-of-the-art voice compression, and telephone channel signalling 
recognition and control. 

The VAE accommodates a variety of line interface protocols ranging from a few analog voice channels to 
multiple T1's and Integrated Services Data Network, both domestic and worldwide. It is easily tailored by the 
customer to suit his or her application. 

The VAE is designed as a multilingual system. Support is provided for single-byte, left-to-right languages 
only. Special provisions will be made for other language requirements, such as bidirectional Hebrew and Arabic, 
or other single- or double- byte languages. The VAE implements enabling language function and translation 
for hardware and software and for translation of customer and service information, such as messages, help 
panels, documentation and nomenclature. 

Typical Scenario 

in a typical scenario, a user calls a telephone number and is directed by the CO switch into the VAE system. 
The voice system interacts with the user through dynamically assembled voice prompts and phrases, and the 
user interacts with the voice system through the voice and the DTMF touch pad on his or her telephone. 

VAE allows the caller to interact with existing customer data base applications, and to transfer to a live 
operator when needed. For example, in a voice-messaging application, the user may be prompted to leave or 
retrieve a message. In a voice response application, he may hear a weather forecast or inquire about his bank 
balance or order a pizza. 

In each case, sub-second response and unbroken voice streams allow users to interact with the system 
as a single image without regard to call routing. In this way, each customer is able to program the system so 
mat it is user friendly, and interacts with him in a manner appropriate to the application being performed. 
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T1/CEPT Signalling 

VAE is required to interface with a variety of CO switches. The preferred medium is T1-D4 in the United 
States and CEPT in Europe- 
Telephony Interface Protocol 

Line information comes from a separate Simplified Message Service Interface (SMSI) channel in each 
implementation. This signalling also comes from the call setup information from existing tariffed signalling ser- 
vices, such as the SMSI. 

As Integrated Services Data Network (ISDN) evolves into general use, the VAE should adopt its interface 
protocol as the preferred technique. Some CO switches may not support the required interface protocol services 
in conjunction with the digital T1 service. In such cases, personalized greetings and message waiting indi- 
cations are not likely to be implemented. To accommodate analog telephone systems a channel bank is 
required to convert analog signals to digital signals. 

Topology and Scalability 

The VAE supports voice channels ranging from a few voice channels to over a thousand. Where redun- 
dancy and capacity requirements are low, the system is designed to operate on a single, standalone hardware 
platform. Where they are high, the system design permits n + 1 redundancy and distributed function, so that 
very large configurations retain a single system image to the user. 

Design Structure 

VAE is an application system, and it operates entirely under AIX, Version 3. VAE operates on IBM RISC 
Model (RISC System/6000). Users interact with the VAE using a high-level application language consisting of 
pre-programmed primitives. In addition, the customer will define and program application servers to supplement 
those provided in the standard release. 

Each VAE node will support up to 72 voice channels in the United States and 90 voice channels in Europe. 
Overall system performance is application dependent Limited multNingual support is intrinsic to the design 
and will be in place for subsequent releases. 

SYSTEM ARCHITECTURE 

The VAE system architecture is shown in Figure 1. The system consists of three logical elements: 
O Front-end ASi 90; 

O Back-end Host data base server 50; and 

O Operations Console (VAG 80, SAF, UAF, and OAM). 

The most general manifestation of the system architecture allows for multiple ASPs and Host servers. In 
the general architecture, the ASI and Host server components are linked using one or more local area networks 
or by direct attachment to a data base Host 

The hardware base for the subject invention is the IBM RISC System/6000 10. The operating system is 
RISC/6000/AIX, a UNIX derivative, with significant real time multi-tasking capabilities. The invention employs 
a back-end data base server 50 designed to exploit a set of sequential files, which are accessed at the field 
level, and indexed sequential files, which are accessed at the field and/or record level. Field-level access pro- 
vides relational data base management system capabilities. The function of the data base server is to retrieve 
and store application scripts, system parameters, voice data, and subscriber information. 

The database server 50 provides intelligent data base services to the ASI and VAG on demand. The data 
base server function operates on the same physical hardware platform as the ASI application. An architected 
interface between the ASI functions and the data base server functions is maintained in common for combined 
and distributed configurations. The data base functions are provided by IBM. A custom server architecture 
enables customers to define and develop functions of their own design using the same Data Base Management 
Services as used by the VAE system servers. The peculiarities of the local server implementation, however, 
are insulated from the ASI by the architected interface. 
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FUNCTIONAL CHARATERIST1CS 

The basic facilities of the VAE system, as illustrated in Figure 2, provide support to end-user applications 
including voice information services, voice messaging and voice interactive transaction processing, similar to 
5 IBM's Voice Response Unit (VRU). Other operations are built on these basic functions or their variants. 

Call Transactions 

Call transactions begin with the establishment of an active telephone connection with the CO at the ASI. 
10 Identifying call information, such as called party number, determines the selection of a script that describes 
and controls the specific type of transaction. An ASI is able to handle all types of transactions for ail subscribers 
and callers. There is no prior binding of caller or transaction type to specific servers or to trunks/channels. 

The call transaction control function in the ASI follows the script for the duration of the active transaction. 
The script contains the list of actions to be performed, such as play a prompt or receive digit, and the conditional 
15 sequencing of the actions in the list The action library resides in the ASI and the script, being customizable 
for each transaction and subscriber, must be fetched from the data base through the Host server. Common 
prompts are cached in the ASI also, while customized prompts and greetings reside in the data base. 

Compression and Decompression 

20 

Compression and decompression of voice signals occur within the ASI immediately after receipt from the 
CO and before being sent back to the CO. Multiplexing and de-multiplexing of the T1/CEPT channel are also 
done at this point Thus, the CO always perceives normal voice. DTMF detection is incorporated in the line 
interface hardware, permitting user interaction with the system. 
25 Digit information is removed from the information by the control function. In the case of messaging, the 
compressed incoming message is stored in the data base in segments during the recording. Recorded mes- 
sages are retrieved in segments, each decompressed and sent out while successive segments are fetched in 
a manner ensuring smooth voice regeneration. Prompts, greetings and other voice responses are played back 
in the same manner. 

30 The compressed voice segments flow over the link between the ASI and Host server. In the Prototype, the 
link between the ASI and the Host is a logical construct designed to maintain compatibility and configuration 
flexibility in later releases. Functions in the Host provide data base access and other support needed by the 
ASI applications, such as updating user profile information. In addition to the voice segments, control infor- 
mation flows between the ASI and Host functions during the transaction. 

35 Application support from the Host may schedule subsequent activity to complete the transaction, such as 
interpreting and passing the transaction to a remote data base system. The interface provided in the General 
Purpose Server allows the customer application to interact with the VAE. It receives transaction information 
from the ASI, activates functions imbedded in the application program, and then returns the results back to the 
ASI. 

40 

MAJOR COMPONENTS 

Figure 3 illustrates the major components of the VAE system: 

O Application Server Interface 200; 
45 O Host 210; 

O Operations, Administration, and Maintenance (OAM) 220; 

O System Administrator Facilities (SAF) 230; 

O User Administration FacBities (UAF) 240; and 

O Voice Application Generator (VAG) 250. 
so This section introduces each of these components with a brief description. Following the VAG section is 
a section listing and describing the VAG application actions. 

APPLICATION SERVER INTERFACE (ASI) 

55 The telephony system connects to the VAE through the Application Server Interface (ASI) 200 of Figure 

3. These connections are the trunk channels over which telephone calls are transmitted. In addition, Simplified 
Message Service Interface (SMSI) 205 is a separate signalling fink over the serial I/O port that conveys infor- 
mation relative to the telephone calls, such as source and destination identification. The AS! 200 handles mul- 
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tiple calls at one time, performs all trie logic, and contains all the states for each call. The number of message 
servers configured in an ASI system is a function of the number of connected trunks. Each ASI can connect 
to a minimum of one T1 carrier with twenty-four channels in the United States and to CEPT with thirty channels 
in Europe. 

5 Compression and decompression of voice traffic to and from the trunk channels also occurs in the ASI 200. 

The ASI does not provide disk storage for the voice messages. Voice messages are sent to the Host for storage 
and retrieved from the Host 210 when requested. The voice messages are always conveyed between the ASI 
and the Host 210 in a compressed form, with a compression ratio of five to one plus pause compression. Clear 
channel mode(1) is included in the VAE architecture. 
10 ASI is built on a RISC/6000 base and uses standard IBM components, with the addition of required cards 
for switch connections and voice compression. The ASI consists of. 

O Voice Card Set 260, 

O Voice Driver 270, 

O Signalling Driver 280, 
15 O Cache Manager 300, 

O Node Manager 310, 

O State Table Manager 320, 

O Prompt Directory Manager 330, and 

O Data Base Interface Manager 340. 
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Voice Card Set 

The Voice Card Set performs voice compression/decompression and signalling management on multip- 
lexed telephone channels. It consists of the following cards: 

VOICE SERVER CARD (VSC): 



Performs voice signal processing and interfaces to the ASI RISC System/6000 Host It contains a master 
signal processor (SPM, or SPO) and five slave signal processors (SPS, or SP1-SP5). Three slave processors 
30 are located on a daughter card. 

TRUNK INTERFACE CARD (TIC): 

This is the VSC daughter card that performs the interface between the VSC and the multiplexed digital tele- 
35 phone line. There is a version for T1 and a version for CEPT. The TIC features an Intel 8751 microcontroller. 

VOICE SERVER CARD ADAPTER (VSCA): j 

Plugs into the microchannel bus and acts as an interface between the Host 210 and tfie VSC 260. The 
m VSCA has no processor, and it can be thought of as a translation mechanism between the |iicrochannel bus 
and the VSC bus 260. The VSC 260 and TIC combination is referred to as the VPACK. The VPACKs (up to 
sue) reside in a 7866 rack-mount modem package. 



Voice Driver 

45 

The Voice Driver code is written as an AIX device driver. It operates synchronously with the voice card set, 
fielding interrupts and moving voice data and status information to and from the voice card set. The interrupt 
routines perform the following functions: 

O Read the status of each channel; 
50 O Move raw voice data to and from the appropriate channels; 

O Move encoded voice data to and from appropriate channels; 

O Retrieve voice signals from cache for playing cached prompts; 

O Receive and send signalling information for each voice channel; 

O Process signalling information for each voice channel; 
55 O Read and write VPACK card level status and commands; and 

O Field and post alarm conditions. 



7 



EPQ484070 A2 



Signalling Driver 



The Signalling Driver software 280 is the code that interfaces with the SMSI 205 in order to process mes- 
sages to and from the Central Office (CO). SMSI is the telephone company's protocol for messages that travel 
5 to and from the CO. 

There are four types of messages that travel to and from the CO. They are: 

CALL HISTORY: 

10 Travels from the CO to the ASI. Information that identifies a call as it comes into the ASI 200 from a CO 
on a T1 line. 

MESSAGE WAITING INDICATOR (MW1): 

15 Travels from the ASI to the CO. It signals the CO to light the message waiting lamp, initiate a stutter tone 

on the customer's telephone, or turn off the message waiting lamp. 

NEGATIVE MWi: 

20 Is the response from the CO to the ASI that the CO is unable to process the corresponding MWI message. 

UPDATE SWITCH: 

Travels from the CO to the ASI. It requests the transmission of all pending MWI messages from the ASI. 
Channel Processes 
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H Channel processes are the vehicles for the ASI's application logic with one active Channel Process tor each 

^ active session. The Session Manager is the primary execution component in the Channel Process and is res- 

\y 30 ponsible for interpreting the user's application script into a script table. The application script, which is prepared 

|.fl using the VAG, consists of actions, parameters, and a table of conditional program flow parameters for the vari- 

^ ous possible return codes, or edges. 

f=2 Examples of edges include: key x pressed, caller hung up, and time out Code for the State Machine and 

I S all the actions is re-entrant and is shared among ail Channel Processes. A portion of a typical state table is 

? 7 35 shown in Figure 4 for the bus schedule. 

W l INTERNAL STATE TABLE: 

w 

M The first state table (internal state table) executed is hardcoded in the State Machine. It contains the fol- 

40 lowing actions: 

IDLE Wait for notification from the Node Mgr. 

ANSWERCALL Get the user profile, get the state table, then answer the call. 

PLACECALL Get the user profile, get the state table and place the call. 

ENDCALL Ensure the line is on the hook, reassign the line to the Node Manager, and do any other 

45 clean-up work that is needed. 

At Channel Process start-up, the State Machine is called with a pointer to the hardcoded internal state table 
and the first entry executed is the Idle action. When notified, Idle wakes up, checks if the request is to answer 
or place a call, and then returns the appropriate return code to pass control to the AnswerCall action for incoming 
calls. All the actions that follow depend on the return codes. These return codes indicate that an action is conv 
50 plete or that some external event has occurred. 

When the entire transaction is complete, the CloseSession action is executed. This is the last action that 
is performed in a state table, at which time, control is returned to the State Machine's internal state table. The 
State Machine then executes the EndCall action and then, again, the Idle action. 

The State Machine also supports nested state tables. By defining state tables, the user can develop a library 
55 of commonly used functions and link them together to create larger and more sophisticated applications. This 
is accomplished by making the State Machine recursive and by using the CallStateTable and ExitStateTable 
actions. 
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Cache Manager 

The Cache Manager controls the storage and mapping of memory for voice segments. Voice segments 
are stored in shared memory. The amount of cache required to store voice segments is determined by the cus- 
tomer's application requirements. The system configuration facility permits the adjustment of the amount of 
cache. 

Caching is transparent. The requesting function is presented with a pointer that indicates a list of voice 
segment pointers. If the voice segment is already in memory, the return is immediate. If the voice segment is 
not in memory, the first segment is retrieved from the Host data base and its pointer is passed to the requesting 
function. The Cache Manager continues to read in the rest of the voice segment independent of the requesting 
function. 

The Cache Manager stores voice data in 4K buffers and contains two control blocks. The first control block 
is a directory that contains a series of maps that describe the location of each of the voice segments in memory. 
The second is a short-term control block that is used to make a logical connection between the segments that 
are requested and their requestors). 

An adjunct process called compress voice segments run at predetermined intervals to reorganize the cache 
memory in order to recover memory space that has been unavailable because of fragmentation. 

Node Manager: 

The Node Manager is considered the parent of the other processes running in the ASI, including the Chan- 
nel Processes. The Node Manager is responsible for loading and initializing the entire system; reading and inter- 
preting initialization parameters from the system tables; and reading permanent voice segments, state tables, 
and prompt directories into memory. It assigns Channel Processes to events when activity is indicated on an 
idle voice channel and serves as the catcher for unsolicited requests that come from the Host. Exceptional con- 
ditions, such as alarms and alerts, are processed first by the Node Manager. The Node Manager also sets a 
deactivated status on selected channels for maintenance. 

The buffer pool, which is managed by the Node Manager, is a pool of 4K buffers available for allocation to 
other system managers and Channel Processes. Any process, such as a Channel Process, or any system man- 
ager requests a buffer through a function call. Then, when the process or manager no longer needs its 4K buffer, 
it returns it with another function call. 

State Table Manager: 

The State Table Manager provides access to state tables and maintains copies of as many state tables 
as needed, up to a predefined limit The Channel Process accesses state tables to interact with calling parties. 
If a Channel Process requests a state table that is not currently in memory, the State Table Manager requests 
that state table from the data base and notifies the Channel Process when it is available. 

The State Table Manager continues to read in state tables until the predefined number is in memory. When 
this number is reached, the State Table Manager replaces the non-active tables with new tables. If there are 
additional requests for state tables that are not in memory, the State Table Manager removes the least recently 
used state table, if not currently in use, and requests the new one to be read. The Channel Process is respons- 
ible for notifying the State Table Manager when it is no longer using a state table. 

Other features of the State Table Manager include processing an invalidate state table request which 
means that a state table has been updated by the VAG, and differentiating between tables that are fixed in 
memory from those that are not 

Prompt Directory Manager: 

The Prompt Directory Manager provides access to prompt directories and variable segment directories, 
and maintains copies of as many prompt directories and variable segment directories as are needed, up to a 
predefined limit. The Channel Process accesses prompt directories and variable segment directories to play 
prompts for calling parties. 

if a Channel Process requests a prompt directory and a variable directory that are not currently in memory, 
the Prompt Directory Manager requests them from the data base and notifies the Channel Process when they 
are in place. The Prompt Directory Manager continues to read in prompt directories and variable segment direc- 
tories until the predefined number is in memory. If there are additional requests for prompt directories and/or 
variable segment directories not in memory, the Prompt Directory Manager removes the least recently used 
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ones, if not currently in use, and requests the new ones to be read. 

The Channel Process is responsible for notifying the Prompt Directory Manager when it is no longer using 
a prompt directory and a variable segment directory. Whenever a prompt directory is updated, the Prompt Direc- 
tory Manager is notified. If the prompt directory is in memory, the Prompt Directory Manager flags it as invali- 

5 dated. A subsequent request from a channel for this prompt directory causes the Prompt Directory Manager 
to request the updated prompt directory from the data base. 

Prompt directories are kept in memory in 4K buffers. The number of 4K buffers required for a prompt direc- 
tory depends on the number of prompts in the prompt directory and the length of each prompt There is no limit 
to the number of prompts a prompt directory may have. The number of 4K buffers a prompt directory can occupy 

10 is limited by the number of 4K buffers available. 

Prompt directories are either permanent or temporary. Once read into memory, permanent prompt direc- 
tories stay fixed in memory. Temporary prompt directories remain in memory until the memory they occupy is 
needed to satisfy other prompt directory requests or the Prompt Directory Manager is notified by the Node Man- 
ager that it must release 4K buffers. In either case, only temporary prompt directories that are not currently in 

15 use are released. 

Data Base Interface Manager 

The Data Base Interface Manager provides the interface between the ASI processes and the data base 
20 servers. The common protocol for processing all requests for data base service and me responses to these 
requests is the Data Processing Request Block (DPRB). 

DPRB Structure 

The Data Base Interface Manager maintains a control table to keep track of outstanding requests that 
require response from the data base servers. The information contained in this table includes: 
O Requestor id 
O DPRB number, and 
O Time of the request. 

The requestor id enables the system to return the request once the Data Base interface Manager gets a 
response from the data base servers. Requests and responses may consist of a single DPRB or multiple 
DPRB's. The time of the request allows the system to pinpoint when a request has timed out There are some 
requests that do not require a response from the data base servers. These requests are passed on to the ser- 
vers without recording them in the control table. 
35 The amount of memory required by the Data Base Interface Manager for the control table is a function of 
the maximum number of requests that require a response and the maximum length of time the Data Base Inter- 
face Manager is required to wait for a response before declaring a time out and returning the request to the 
requestor. 

40 HOST SYSTEM 

The Host system consists of the following subcomponents: 
O Data Base Management System 
O Message Router 
45 O VAE Data Base Servers 
0 Custom Servers 

Data Base Management System 

50 The Data Base Management System consists of the Data Base Service Manager function and the indexed 
sequential and sequential file structures. The Data Base Service Manager, or server, receives DPRB's from, 
and sends DPRB's to the Data Base Interface Manager. Data base servers provide various data base services 
to the ASi and the VAG. These services include retrieving, creating, deleting, and updating data in a file, as 
well as performing various data base backup and recovery functions. 

55 The Data Base Service Manager and the indexed sequential and sequential files access methods provide 

data access tunctions. Figure 5 illustrates an overview of the data base files and their relationship to one 
another. 
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Overview of the Data Base Files 

A file can be accessed for services at the file level, the record level, or the field level, depending on what 
access method is used to implement the file. For example, the indexed sequential access method is used prinv 
5 arily to access a file at the record level. If there is no need to access a file at the record level, the sequential 
access method is used. User profile files and mailbox files are accessed at the field level. 

Message Router 

10 The Message Router is implemented as an AIX queue where messages are forwarded to the AIX queue 
by the Data Base interface Manager. The messages, which are the service requests torn either the ASI or the 
VAG, are formatted in either the direct or indirect form. The direct (or long) form is where the input parameters 
are contained within the message itself. The indirect (or short) form is where the input parameters are contained 
within the DPRB. 

15 In either case, the data base server searches the queue for its request based on the message type and 
receives it into its task space for processing. 

Data Base Servers 

20 There are four data base servers in the VAE system. They are: 
O State Table and Prompt Directory Server; 
O User Profile/Mailbox Server, 
O SGAM Server; and 
O VAG Server. 

25 

STATE TABLE AND PROMPT DIRECTORY SERVER: 

The State Table and Prompt Directory Server retrieves a state table, a prompt directory, or a variahle seg- 
ment mapping file from the data base and returns it to the ASI through the Data Base Interface Manager. The 
30 server uses the sequential file access method to accomplish this task. 

USER PROFILE/MAILBOX SERVER: 

The User ProfDe Server retrieves, updates, or deletes a record of a user profile file for the ASI through the 
35 Data Base Interface Manager. A record within a user prof Be fie can be updated at the field level. For all func- 
tions, with the exception of retrieval, the ASI can request the server to return an acknowledgment of completion. 
The server uses the unique user id to search for a record in the user profile file. 

The Mailbox Server contains the mailbox data base that provides the link between the user profile files and 
the message files. The mailbox data base is a collection of mailbox entries, or message headers that describe 
40 each message that arrives at the VAE. Pertinent information that is contained in the message header includes 
the message id, the sender's id, the receiver's id, and the message status code. 

SGAM SERVER: 

45 The Segment/Greeting/Audio Name/Message (SGAM) Server retrieves, creates, updates, deletes, 

queries, renames, or copies a data unit for ASI through the Data Base Interface Manager. A data unit can be 
a voice segment, a greeting, an audio name, or a message. Each voice segment, greeting, audio name, and 
message contains a set of voice records, and can be searched by a unique key. For example, the unique key 
for a voice segment is composed of a segment id and a sequence number. 

50 With the exception of retrieval and query, the ASI requests the SGAM Server to return an acknowledgement 
of completion. The server uses the indexed sequential file access method to search for a data unit Voice seg- 
ments are stored in different files according to language code and compression ratio, greetings and audio 
names according to compression ratio, and messages according to recording date and compression ratio. 

55 VAG SERVER: 

The VAG Server retrieves, creates, updates, or deletes either a record in a file or a file. An example of a 
file might be a state table or a prompt directory. An example of a record of a file might be a voice segment in 
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a voice file or record in a user profile. For all functions, with the exception of retrieval, the VAG requests the 
server to return an acknowledgment of completion. The server primarily uses the sequential access method to 
access files and the indexed sequential access method to access records in files. 

Custom Servers: 

In addition to the standard VAE data base servers, the following two special servers allow customers to 
connect their own applications to the VAE system: 
O 3270 Server 
0 General Purpose Server 

3270 SERVER: 

The IBM 3270 Server is a turn-key method of interfacing the VAE with existing Host-based applications 
that use 3270-ty pe displays. Other methods require the customer to develop custom interface routines The VAE 
uses the AIX Host Connection Program (HCON) to support the functions of the 3270 Server. Communication 
with the Host is provided by an AIX systems network architecture (SNA) service's token ring and synchronous 
data link control (SDLC) links through the RISC/6000 token-ring and multi-protocol adapter. 

Up to twenty-six concurrent sessions are supported by token ring or SDLC links. Any 3270 terminal or con- 
troller supported by AIX HCON may be used by the VAE. To create an application, the customer uses the VAG 
to customize the ASI interaction with the 3270 server. To accomplish this, the customer uses VAG functions 
such as Host session parameters, logons, expected Host sequences, error handling, and data field locations 
and types. Neither programming expertise nor the use of a compiler is necessary to create VAE applications 
using the 3270 Server. 

GENERAL PURPOSE SERVER (GPS): 

The General Purpose Server (GPS) is a construct designed to provide open-ended access to the facilities 
of the VAE. It is used to support local and remote data base access through the use of custom servers. In gen- 
eral, the customer is responsible for creating the custom servers needed using the Custom Server User Inter- 
face in conjunction with his/her own programming logic and the VAE Application Generator. 

The resulting custom server operates within the Host or pseudo-Host component of VAE. Server logic is 
provided in the form of source or object modules specified by the customer using a compatible programming 
language processor, such as: COBOL, FORTRAN, C, or PASCAL Any operating system service orfacility may 
be used, with the condition, that the design must conform to the performance requirements of the application. 

Interaction with the VAE telephony environment is provided automatically using the pre-processor phase 
of server build. In a server development session, the customer's logic modules are imported, the Host interac- 
tion is specified with the front-end script, and the combined specification is submitted to the build process. 

The GPS design can be explained by dividing it into two parts: the GPS components and the GPS architec- 
ture. TTiis section provides a brief description of the GPS components followed by an overview of the GPS 
architecture. The GPS components are: 

O APPLICATION PROGRAM INTERFACE (API) 

This interface module is part of the VAE system and is stored in a library in the VAE system data base. 
O USER APPLICATION MODULES (UAM) 

These interface modules are a collection of routines supplied by the customer to provide access to the exist- 
ing customer application system. 
O GPS PREPROCESSOR 

This is an Application Specification File parser composed of AWK programming files. 
O APPLICATION SPECIFICATION FILE (ASF) 

This file defines and specifies the parameters and information necessary to generate the custom server. 
It is created by the customer using the Custom Server User Interface. 
O APPLICATION FUNCnON/SUBFUNCTION FILES (ASF), (AFF) 

These files contain the function id for the customer application system and the information for invoking the 
User Application Modules. When creating a custom server, the customer is responsible for providing the 
User Application Modules and creating the Application Specification File. The GPS provides the remaining 
components and all the processing necessary to generate and implement the custom server. 
The GPS design architecture is illustrated in Figure 6. The GPS is divided into four stages: 
O Custom server development; 
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O Script processing; 

0 VAE system initialization; and 

O Runtime facilities. 

Custom server development consists of creating the Application Specification File (ASF) and storing it in the 
VAE system data base. This is implemented using the Custom Server User interface. When the Application 
Specification File is created, it is subjected to a BUILD process. 

During the BUILD process, the GPS Preprocessor reads the Application Specification File (ASF) and gen- 
erates the Application Function/Subfunction Files and the Application Control Program. The C compiler com- 
piles the Application Control Program and a MAKE utility links it with the User Application Modules to build the 
executable file as the custom server. 

The script processing consists of generating a script with the action parameters necessary to link the appli- 
cation with the custom server. The main action parameters are the SendData and ReceiveData actions. Also, 
when generating the script, the script reads the Application Function/Subfunction Files and uses the Custom 
server (bus scheduling) and subfunction name (get-city, get-schedule) to generate the script 

The Application Function/Subfunction File directs the script programmer to give input parameters) as 
needed by the subfunctions. At VAE initialization, the system sets up the system parameters and allocates the 
resources necessary to implement the custom server. Finally, at runtime, the application runs the script table, 
sends the RB (request block) that contains the custom server number (function id), the transaction number (sub- 
function id), and the parameter data to the custom server. The custom server then parses the parameter data 
to the input parameters and passes the input parameters to the customer application system. 

When the customer application is complete, the custom server passes the RB header and the return par- 
ameter back to the VAE application. 

OPERATIONS, ADMINISTRATION AND MAINTENANCE 

Operations, Administration and Maintenance (OAM) functions for the VAE system includes configuration 
management, performance management, and error management OAM performs the following functions: 
O Provide a console interface for system operation and command execution; 
O Statistics collection; 
O Statistics reporting; and 
O Error management 

Console Interface 

The OAM Console Interface provides the environment for the system administrator to continuously monitor 
the status of the VAE system. It also allows the system administrator to take appropriate action in response to 
alerts and warnings. The status information is displayed graphically and refreshed at regular intervals. Status 
information includes: 

0 Percentage of buffer pool avaflable; 

O Percentage of disk space in use; 

O Number of active lines on each VSC and the status of each line; 

O System performance statistics such as voice segment cache memory requirements; and 

O System configuration information. 

The OAM Console Interface operates in the AlXwindows environment The screen controls and options 
are grouped by function and purpose and are activated by input from either the keyboard or the mouse. The 
main screen is divided into three areas: 

O Menu bar; 

O System status display area; and 
O User selected status display area. 

The system administrator may execute administrative commands to change the configuration and/or oper- 
ational characteristics of the system. Administrative commands include: 
O Start, stop, or block a channel; 
O Start or shut down the system; 
O Control system resource usage (request buffer release); 
O Request reports; 
0 Change system parameters; 
O Add, modify, and delete user profiles and mailboxes; 
O Define classes of service; and 
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0 Direct the console display to other devices. 
Statistics Collection 

Collection of statistics concerning system operation and resource utilization is accomplished by a process 
that executes on a periodic basis. This process reads shared memory locations and interfaces with other pro- 
cesses; such as, the Node Manager and the State Table Manager. 

Data accessed by this process include the: 

0 Buffer pool; 

O Disk space; 

O Voice segment cache completion percentage; 

O State table hit/miss ratio; 

O Prompt directory hitfmiss ratio; and 

O Trunk error performance. 

Statistics Reporting 

In addition to the error logging process described earlier, OAM records events in an event log. The event 
log includes call completion records, console operations events, and threshold violations. 

Error Management 

The VAE error management system fields ail detected errors in the system. Each hardware and software 
component is designed to identify error conditions. For example, the VSC continually monitors the trunk status 
and presents error conditions (in the form of alarms) to the VAE software. Similarly, each VAF software com- 
ponent tests for invalid inputs, system-related failures, illegal requests, or resource availability problems. While 
the architecture allows VAE to present errenroll as an interface to the custom server writer, the Prototype does 
not allow the customer to use this error recovery service. 

VAE SOFTWARE ERROR RECOVERY 

Coverage of VAE software identified failure conditions is targeted at 100 percent That is, all identified error 
conditions are detected and recovery actions is assigned. Ail recovery actions can be grouped into the following 
five general types of recovery procedures: 

O Logging; 

O Process local recovery; 
O Multiple process recovery; 
O Single process restart; and 
O System restart. 

In the event that a software module receives invalid input, the VAE may log the problem, disregard the trans- 
action, and notify the requestor of the error condition. An intermittent failure, such as a failed data base query, 
may initiate a process local recovery procedure, in which case, the requesting process may retry before escalat- 
ing the problem to OAM. 

A shortage of shared buffers in the buffer pool may require a multiple process recovery procedure. In this 
event the requesting process notifies a system management process, such as the Node Manager, which, in 
turn, requests other processes to free unused buffers to make them available to the original requestor. 

Recovery for VAE software failures caused by data corruption, logic errors, or exceeding designed bound- 
aries is to quiesce the Channel Process, if possible, and re-initialize it This is an example of single process 
restart The same is true of failures caused by event overflows or a missed interrupt The error management 
system can generally restartany failed non-system management VAE process. A faiure in system management 
processes requires a full system restart 

AIX AND SYSTEM HARDWARE ERROR RECOVERY: 

Coverage of non VAE problems is limited to those errors detected by AIX and the hardware. AIX and system 
hardware errors may or may not be recoverable. Disk and controller failures, memory checks, bus checks, and 
other hard errors are almost always fatal. Fatal errors require manual intervention. When intermittent errors 
occur during normal software execution, the software may retry at least once before escalating the problem to 
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Failures caused by insufficient resources may or may not be fatal. Recoverable conditions, such as a disk 
full conditions causes all attempts to record voice or update user profiles to fail. Under such conditions, and 
when more 3270 emulation sessions are required than are currently enabled, a system busy announcement 
may be played, if specified by the customer script, and input is declined until resources are available. 

Error conditions that occur at the telephone line trunk interface (VSC) is analyzed and a determination of 
Red Alarm, Yellow Alarm, or Alarm Indication Signal is made. The occurrence of a Red Alarm causes the VAE 
system to disconnect all calls on the unit and make all channels busy at the originating end for the duration of 
the condition The VAE system automatically restores service after the alarm has been cleared. 

Certain conditions can trigger a flood of error that can overwhelm the logging device (or Host). Provisions 
are made to set thresholds for error reporting. These threshold settings are variable and can be reset The con- 
sole operator is notified of the state of all channels and about all errors. The operator then diagnose the error 
and initiate the appropriate error recovery action. 

SYSTEM ADMINISTRATOR FACILITIES (SAF) 

The SAF is a VAE application that allows the system administrator to establish, maintain, and support the 
following system functions: 
O Application Profiles; 
O Administrator Profiles; 
O National Language Support Setup; and 
O System Configuration. 

The SAF is menu-driven and is essentially a graphics workstation application using AlXwindows as the basis 
for its design. Access to the SAF is from the main default window for the VAE applications. 

Application Profiles 

An application profile, is used at ring time to set up the actions that are performed after the telephone is 
answered. The most important information stored in the application profile is the state table id {the application 
to run) and the entry point in the state table. The application profile is stored in the same file as the user profile. 
Unlike the user profile, it does not describe the user; instead, it describes the application. 

The application profile user interface consists of a panel where the system administrator enters the profile 
data. This panel consists of a list of the existing application profiles and the actions: ADD, DELETE, MODIFY, 
and SEARCH. On the right-hand side of the panel is a work area where the application profile information is 
entered and displayed. The system administrator may select an existing application profile from the list and 
perform an action on it, or by using the ADD action, She may create a new profile. 

Each application profile includes: 

O Phone number; 

O Application name; 

O Comment field; 

O Language ID; 

O State table to be used with the application; 
O Release number for the state table; and 
O Entry point into the state table. 

National Langage Support (NLS) Setup 

The VAE system processes, simultaneously, applications consisting of multiple languages. For this reason, 
the National Language Support (NLS) setup program is designed and implemented. The NLS setup program 
duplicates an application in a new language and then allows the system administrator the option of updating 
the application using the new language. NLS is designed in the same way as the other SAF programs using 
AlXwindows. , . 

When NLS is selected from the SAF menu, the system administrator is able to modify screens for an existing 
language or create screens for a new language. There are two parts to the NLS program. The first part allows 
the system administratorto change the language displayed in the panel fields of the application. Using the NEXT 
action on the NLS panel, she is presented with each panel field that can be translated. Examples of these fields 
are MODIFY and DELETE. 

The second part of the NLS program is the translation of the application text into the new language. The 

15 



EP 0 484 070 A2 



system administrator is presented with each panel of the application, where she can change the language of 
the text for each panel. Examples of the text that is translated are state purpose, action name, and edge name. 

In addition to changing the language of the text for an application, the system administrator can also change 
the playback characteristics of the voice information for the chosen language. 

System Configuration 

System configuration is a menu option that allows the system administrator to edit configuration par- 
ameters. The configuration parameters are grouped by logical component Default configuration groups are: 
O VAE configuration; 
O ASI configuration; 
O VSC configuration; and 
O Language configuration. 

The configuration panels for all the groups are the same: on the left-hand side of the screen, appears a list of 
parameters to select from; and on the right-hand side, there is space to alter the selected parameter. 

VAE CONFIGURATION: 

VAE configuration consists of system parameters that define the operating characteristics of the VAE sys- 
tem. The VAE configuration program stores and displays these system parameters so that they can be acces- 
sed by the system administrator when modifying system configuration. 

The VAE configuration program allows the system administrator to configure the system at initial startup 
and to reconfigure the system when there are modifications. System configuration includes the tasks of choos- 
ing the disk space used for recording messages and setting time outs for timed actions. Like the other programs 
discussed in this section, VAE configuration is a menu-driven program that uses a graphics panel user interface. 

ASI CONFIGURATION: 

ASI configuration consists of parameters that define the operating characteristics of ASI. The ASI configu- 
ration program stores and displays these parameters so that they car be accessed by the system administrator 
when modifying ASI configuration. 

The ASI configuration program allows the system administrator to configure ASI at initial startup and to 
reconfigure ASI when there are modifications. ASI configuration includes the tasks of specifying the size of the 
4K buffer pool, determining the size of cache, and selecting the number of voice cards. Like the other programs 
discussed in this section, ASI configuration is a menu-driven program that uses a graphics panel user interface. 

VSC CONFIGURATION: 

The VSC configuration program allows the system administrator to configure tine VSC at initial startup and 
to reconfigure the VSC when there are modifications. VSC configuration includes the task of setting information 
related to telephone connection, such as the compression type, the number of lines per trunk, and the types 
of trunks. Like the other programs discussed in this section, VSC configuration is a menu-driven program that 
uses a graphics panel user interface. 

LANGUAGE CONFIGURATION: 

Language configuration refers to the way system configuration parameters are defined for each language. 
English functions as the basic language. Using the language configuration, the system administrator describes 
how to play variables such as numbers, dates, time, currency, and telephone numbers. For example, the voice 
format for date includes in which portions of the variable are played {day of week, day of month, month of year, 
and year), as well as the qualifiers for the variable (day of month as an ordinal or a cardinal number). In addition, 
a Variable Mapping Table allows the system administrator to define the smallest, or "primrtrve/ elements of 
the variable (such as months of the year and days of the week). 

Utilities Modules 

The Utilities Module provides report printing for selected system data files, such as user profiles, application 
profiles, and system administrator profiles. 
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USER ADMINISTRATION FACILITIES 

The UAF is an interactive program that allows the system administrator to create and maintain the: 
O User profile; 
5 O Messages sent list; and 

O Messages received list 

User Profile 

10 The user profile function allows the system administrator to create and maintain the data that defines each 
user. This data is contained in the user profile. Each user profile is uniquely identified by a user id (phone num- 
ber/extensions) and a mailbox id or id's (a user can have more than one mailbox). Using the user profile function, 
the system administrator can: 

O Search for a user profile in the user profile list; 
15 O Select a user profile from the user profile list; 

O Specify the contents of the user profile data fields; 
O Add a user profile to the user profile list; 
O Modify an existing ser profile in the user profile list; 
O Delete a user profile from the user profile list; and 
20 O Select a mailbox maintenance functions. 

Messages Sent from a Mailbox 

This is a mailbox maintenance function that lists the messages sent by each user in the VAE system. Using 
25 the messages sent list, the system administrator can search for a message in the list 

Messages Received in a Mailbox 

This is a mailbox maintenance function that lists the current messages received by a user in a given mail- 
30 box. Using this function, the system administrator can: 
O Search for a message in the list; 
O Delete selected messages from the mailbox; and 
O Clear all messages from the mailbox. 

35 VOICE APPLICATION GENERATOR 

The Voice Application Generator is the tool used for the Voice Application Enabler (VAE) application gen- 
eration. It is a graphics workstation typically used by the application developer to create, modify, and customize 
voice applications. The VAG main functions are: 
40 O create and modify voice segments; 

O create prerecorded system messages known as prompts; 

0 create state tables that promote the application flow and define edge and default conditions; 
O maintain and test voice applications; 

O provide links between the 3270 Server and other Host-based applications; and 
45 O build and implement the custom server. 

Through the VAG facility, the application developer creates the original default scripts and prompts when 
the VAE system is first implemented. This function enables the creation of applications that allow the VAE to 
function, among its many other application capabilities, as an answering machine, voice messaging system, 
and voice response unit. 

so The VAG incorporates a SEARCH function and an on-line HELP system. The SEARCH function provides 
a list of search actions for all appropriate VAG routines. The HELP system provides: 
O menu-driven, context-sensitive help panel for each menu; and 
O An item-level help screen for selected items on each menu. 

A display technique that emphasizes selected words in the help text is also provided. The displayed help 
55 text is language dependent The VAG is muiti-iinguai, menu-driven, and consists of the: 
O Application Manager; 
O Voice Generator; 
O Prompt Generator, and 
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0 State Generator. 
Application Manager 

5 The Application Manager maintains the applications that are available in the VAE system. It aJso manages 

the state tables, prompt directories, and voice segments that define an application. The Application Manager 
allows the user to perform the following functions: 
O Select an application from the application list; 
0 Delete an existing application and all its component parts; 
10 O Delete a state table; 

O Delete a prompt directory; 
O Debug a state table; 
O Provide a custom server interface; 
O Save a complete application to archive media; and 
15 O Restore a complete application from archive media. 

STATE TABLE DEBUGGER: 

The state table debugger provides the functions that allow the system administrator to verify the 
20 functionality of a given state table, and to determine if and where problems exist within a given state table. 

CUSTOM SERVER USER INTERFACE: 

The Custom Server User Interface (CSUl) is the tool used for creating, bulding and maintaining custom 
25 servers, ft is a graphics workstation typically used by the application developer to develop the Application Speci- 
fication File. This Application Specification File is the module GPS uses to build the custom server. From the 
custom server application main screen, the application programmer can elect to browse or modify an existing 
custom server, or create a new server. 

30 Voice Generator 

The Voice Generator is used to create/modify/delete the basic unit of voice. This basic unit of voice is a 
word, phrase, sentence, or set of sentences and is called a voice segment There is both a textual represen- 
tation and an audible representation of the voice segment The voice segment identifier is the link between the 
35 text and the audible voice segment 

The Voice Generator is an interactive program that allows the application developer to: 
O Create, modify, delete, and display the textual representation of the words, phrases and sentences recor- 
ded as voice segments; and 

O Create, modify, delete, and display the digitized voice segments. 
40 The Voice Generator program is divided into two main parts: text editing and voice editing. Text editing 
corhsists of editing voice segments in the form of text, while voice editing consists of editing the audible voice 
segments in the form of digitized voice signals. The Voice Generator user interface consists of two work panels 
where the text and the digitized voice data are entered. 

The first panel, illustrated in Figure 7, is the VAG Prompt Segment Editor. This display provides the user 
45 with a highly visual, user-friendly method of listing and maintaining the textual representation of the voice seg- 
ments. Using a mouse and keyboard, the user can enter and modify textual representations of voice segments 
in a simple and straightforward manner. Then, by simply clicking a mouse button when the cursor is positioned 
on MOD VOICE, the user can change the panel from the textual representation of the voice segment to the 
digitized voice signal panel. In fact, a major feature of the Voice Generator is the ease in which the user can 
so go from the text to the digitized version of the voice segment, thus working on both versions at the same time. 
The second panel is a VAG Digitited Voice Editor panel similar to the panel illustrated in Figure 7. It allows 
the user to: 

O Select and display one or two voice segments; 

O Display all or selected portions of a voice segment using a zoom action, magnify the digitized voice signal 
55 for a closeup view, 

O Create new voice segments from existing voice segments; 
O Delete selected portions of a voice segment; 

O Copy selected portions of a voice segment to different places in the voice segment; 
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O Copy selected portions of one voice segment into another voice segment; 

0 Switch back and forth between two windows to edit two voice segments simultaneously; and 

O Playback and record a voice segment 

Prompt Generator 

The Prompt Generator is used to create/modify/delete prompts. These prompts are the recorded sentences 
or set of sentences that are presented to the subscriber when she communicates with the telephony system. 
The Prompt Generator is an interactive program that allows the application developer to: 

O Create, modify, delete, and display prompt directories; and 

O Create, modify, delete, and display individual prompts. 

The Prompt Generator program is divided into two main processes: defining prompt directories and defining 
the prompts that are listed in the directories. The Prompt Generator user interface consists of three primary 
work panels where the information necessary to define the prompts and prompt directories are entered. 

The first and second panels are the VAG Prompt Directory Editor and the VAG Prompt List Editor panels. 
These user interface panels provide the user with a highly visual, user-friendly method of listing and maintaining 
both the prompt directory and the prompts themselves. Using a mouse and keyboard, the user can enter and 
modify the prompts and prompt directories in a simple and straightforward manner. 

The third panel is the Prompt panel, illustrated in Figure 8, which is essentially the work area where the 
prompts are created, it provides the application developer with a list of voice segments 510 in which to build 
a prompt and the tools to create conditional teste within a prompt It also provides a list of variables 500 when 
variables must be played in a prompt These tools are presented as dialog windows that display the required 
information. All the user has to do is select the necessary information from the dialog windows. This user inter- 
face design provides a highly convenient and efficient way to build prompts from voice segments, variables, 
and conditional tests. 

State Generator 

The State Generator is a tool used to create/modify/delete state tables and states. A state is one stage or 
step in a logical sequence of actions that comprises a telephony application. A state table is a table comprised 
of these states. A state table provides the VAE with the basic rules to run the application through states, actions, 
parameters, and edge values. 

The State Generator is an interactive program that creates and updates state tables and states, it consists 
of the VAG State Table Editor and VAG State Editor panels. The VAG State Editor panel is illustrated in Figure 
9. These panels allow the user to: 

O View the complete state table including the state number and purpose; 

O Create, add, modify, or delete a state table; 

O Select a state by number from the list displayed and view the details on the second panel; 
O Set default options such as preassigned edge values; 

O Display a complete list of actions and their rules, such as inputs needed that include parameters and 

required edges 

O Modify an existing state; 

O Delete an existing state; and 

O Create a new state. 

When modifying or adding a state, the user can select an action from the list of displayed actions or by 
entering the action in the field provided on the panel. The parameters, if required, are also selected from a list 
displayed on the panel 600. Then, the State Generator prompts for the edges 61 0 that are required by the selec- 
ted action. 

When the application is complete, a function called check consistency is enabled. Check consistency moni- 
tors: 

O The validity of the edges that reference an existing state; 

O The validity and existence of the parameters), if needed. For each prompt, a test is performed in all 
defined languages to verify the existence of this prompt; and 

O The minimum of logic required for a state, such as the existence of a CloseSesston action. 
VAE APPLICATION ACTIONS 

The following is a collection of the actions that are used when defining a state table. 
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O Idle (Internal Action) 

0 AnswerCall (Internal Action) 

O EndCall (Internal Action) 

O PlayPrompt 
5 O GetKey 

O GetData 

O GetText 

O GetFindName 

O GetFindPassword 
10 O EvaiuateData 

O AssignData 

O PlayVoice 

O RecordVoice 

O SaveVoice 
15 O DeleteVoice 

0 CheckStorage 

O CheckMailbox 

O UpdateMailbox 

O UpdateUserProfile 
20 O SendData 

O ReceiveData 

O GetFindData 
P O CallStateTable 

u3 O ExitStateTable 

kQ 25 O Disconnect 

[J O CloseSession 

l£k Each of these actions are discussed below in sections bearing the name of the action. 



Idle (internal Action) 



30 



The Idle action is reserved for the internal state table only. This action is run by a Channel Process when 
it has nothing to do. It causes the process to sleep while waiting on a semaphore to be notified by the Node 
Manager. When notified, Idle determines whether the Node Manager is requesting the process to answer a 
call or place a call, and then returns the appropriate edge to the state machine. 
35 PARAMETERS NONE. 
EDGES 

0 = AnswerCall; 

1 = PlaceCall; and 
HUP = EndCall. 



40 



AnswerCall (Internal Action) 



The AnswerCall action is reserved for the internal state table only. This action initializes the Channel Pro- 
cess to process an incoming caH. It retrieves the user profile for the called telephone number, the state table 
45 defined in the user profile, and the prompt directory defined in the state table. Then it answers the phone and 
evokes the State Machine with the state table and the starting edge that is defined in the user profile. 
PARAMETERS NONE. 
EDGES EndCall. 



50 EndCall (Internal Action) 

The EndCall action is reserved for the internal state table only. This action cleans up when a session is 
complete. It ensures that all lines used by this session have been reset (placed on-hook) and reassigns them 
in the line table back to the Node Manager. 
55 PARAMETERS NONE. 
EDGES Idle. 
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PlayPrompt 



This action builds and plays the voice segments that are defined in the prompt directory. Prompts are played 
in the language specified in the application profile. Prompts consist of: 
5 O The prompt id number. This number identifies the prompt and is used as the PlayPrompt parameter. 

O The force play option. This number indicates whether or not a prompt is force played. 
0 The time out option. This number specifies the number of seconds the user has to respond at the next 
GetKey or GetData action. 

0 The repeat option. This specifies the number of times a prompt is repeated before a T2 time out A T2 
10 time out initiates a Disconnect action. 

O A list of segment ids, system variables, and conditional tests. The conditional test controls what segments 
and variables are to be played for a particular prompt based on conditions at run-time. 
For example, PlayPrompt is used any time the system interacts with the user to give information and directives, 
or to answer questions. For instance, in a voice messaging system, if the user selects the LISTEN option, the 
is prompt might be: "You have no newmessages. You havefourold messages. Your oldest message is two weeks 
old." 

PARAMETERS 

PROMPT NUMBER. There is no default 

FORCE PLAY. To force play means to play a prompt completely even when interrupted by a keyed input 
20 Force play is used if there is an important message, such as voice storage is full. In this 

case, the message is not interrupted. The force play flag exists in the prompt directory. 
In this case, the force play parameter overrides the flag set in the prompt directory. The 
default is do not force if a key on the phone pad is pressed. This means that the user can 
interrupt the prompt by his keyed input if he does not want to listen to the prompt in its 
25 entirety. The system then automatically goes to the next step of the application. 

EDGES 

0 = The prompt has played completely except in the case where a prompt was interrupted by a keystroke. 
In this case, the prompt does not play completely. 

1 = There is a voice channel problem. 
30 HUP = The caller has hung up. 

GetKey 

Getkey is used to receive a keyed input from the caller when a choice of options was given in the previous 
35 prompt The previous prompt also provides GetKey with the number of seconds to wait before time out occurs 
and the maximum number of times to repeat this prompt before time out The GelKey action recognizes a single 
keyed input only. 

For example, if the prompt is: To record, press 1; to listen, press 3; to change personal options, press 8; 
to transfer out of the system, press 7, then press #", the logical state processed after the above PlayPrompt 
40 action is the GetKey action. In this example, key 1 activates a record session, key 3 activates a listen session, 
key 8 activates the personal options session, key 7 transfers out of the system, while all other keys execute a 
PlayPrompt stating, "I do not understand this command. Please try again." 



PARAMETERS BUFFER NAME. The keyed input is stored in this buffer 





for future use. 
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# - User pressed key # Tl= Time out 

T2= Last repeat time HUP= The caller has 

hung up. 



GetData 



Where the GetKey action is used for a single key input, the GetData action enables the application to 
10 receive several keys in a single state step. When completing the input the last key pressed must be the # key. 

This action accepts the keyed inputs and stores them in a variable. An edge is returned to reflect the status of 

the input The previous prompt also provides GetData with the number of seconds to wait before time out occurs 

and the maximum number of times to repeat this prompt before time out 

For example if the prompt is: "Please enter your new password, it must be from five to eight characters 
15 long", the GetData action requires a keyed input that is a minimum length of five characters and a maximum 

length of eight characters. The caller must enter a password of from five to eight characters long, followed by 

pressing the # key. 

PARAMETERS 

PARM1: Buffer name in which the input is stored 
20 PARM 2: If the buffer is a character string, this is the minimum length of the input if the buffer is numeric, this 

is the minimum value of the input 
PARM 3: If the buffer is a character string, this is the maximum length of the input If the buffer is numeric, this 

is the maxamum value of the input 

EDGES 

25 0 = Input length or value is correct 

1 = Input length too short or value too small 

2 = Input length too long or value too large 

3 = Input incomplete. The # key was not pressed. 
T1 = Time out Nothing was entered. 

30 T2 = Last repeat time 

HUP = The caller has hung up. 

GetText 



The GetText action works much like the GetData action. This action enables the application to receive 
ASCII text data as inputfrom the DTMF keypad. Two DTMF keys are pressed by the caller to designate a single 
ASCII character. When entry is completed, the caller must press the # key. 

The ASCII text entered during this action is stored in a character buffer. An edge is returned to reflect the 
status of the entry: too short, too long, or time out occurred while waiting for the input 
40 PARAMETERS 

PARM 1 : Buffer name where the input is to be stored; 
PARM 2: Minimum length of the input; and 
PARM 3: Maximum length of the input 
EDGES 

45 0 = Input successful 

1 = Input too short 

2 = Input too long 

3 = Input incomplete, time out occurred 
T1 = Timeout, no input yet 

so 12 - Last repeat time 

HUP = The caller has hung up. 

GetFindName 

55 This action is used any time a caller or a message receiver must be identified by digit name or extension 

number. A digit name is a representation of the user's last name when it is spelled using the alphanumeric key 
pad on the telephone. CetFindName is used when the caller logs on to system services or when the caller sends 
messages through system services. 
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An illustrative embodiment of the present invention concerns a connected-word and -digit (hereinafter 
"connected-word") speech recognizer for information services. The embodiment exploits the idea that user 
speech in the information service context is often predictable, for example, from past speech of the user, or 
from constraints on or the nature of a request for information. Via a training or initialization procedure, one or 

5 more lists (i.e., databases) of connected-word speech are built and maintained. A list comprises the likely spok- 
en responses to a given request for information by the information service. For each connected-word speech 
recognition task, recognition is performed in the first instance by reference to the fist or set of likely responses 
to that request The unknown connected-word speech is compared to the entries in the list by assembling for 
each list entry appropriate reference patterns (as specified by each list entry) and by using a time alignment 

10 procedure such as Dynamic Time Warping. Each comparison to a list entry yields a comparison score. The 
unknown speech is recognized as the list entry with the best score below a user specified or machine deter- 
mined threshold. For those occasions when no comparison score is below the threshold {or when two or more 
scores are below the threshold), one or more back-up procedures are provided. 

15 Brief Description of the Drawings 

Figure 1 presents an illustrative tree structure of a user interface for an information service. 
Figure 2 presents an illustrative embodiment of the present invention. 
Figure 3 presents a speech recognizer as an illustrative embodiment of the present invention. 
20 Figure 4 presents an illustrative data structure for a list stored in the memory of the recognizer presented 
in Figure 3. 

Figure 5 presents an Slustrative data structure for word patterns stored in the memory of the recognizer 
presented in Figure 3. 

Figure 6 presents an exemplary sequence of feature vectors as specified by an exemplary fist response 
25 and associated word patterns presented in Figures 4 and 5, respectively. 

Figures 7 and 8 present a flow chart of an illustrative process executed by a processor of the recognizer 
presented in Figure 3. 

Figure 9 presents an illustrative graph of a Dynamic Time Warping alignment path, w(n). 
Figure 10 presents an illustrative embodiment of a connected-digit speech recognizer for a telephone rejv 
30 ertory dialer. 

Figures 11 and 12 present a flow-chart of the operation of the processor of the illustrative embodiment 
presented in Figure 7. 



<s sip- 



lip Detailed Description 

y= 35 

m Generally, user interfaces for information services operate according to a logical tree structure. Figure 1 

p : presents a diagram of such a tree 10. The tree 10 includes nodes 15, branches 20, and tasks 25. Each node 

11 15 represents an explicit or implicit request for information put to the user by the information service. Each 

r ~ node 1 5 is related to other nodes 1 5 by one or more branches 20. Each task 25 represents a function performed 

40 by the service for the user. As such, a series of requests made and responses given defines a logical path 
through nodes 15 and branches 20 of the tree 10 specifying a task 25 to be performed. Since each node 15 
represents a request for information, each node 15 may also represent a task of resolving uncertainty in a re- 
sponse. 

Figure 2 presents an fllustrative embodiment 50 of the present invention. The embodiment 50 provides a 
45 comparator 51 and database 52. Database 52 comprises one or more likely responses to one or more requests 
for information (represented by nodes 15) put to an information service user. Information 53 is received from 
a service user via an input device in response to a service request and is provided to the comparator 51. To 
resolve uncertainty in received information 53, the comparator 51 provides control/data signals 55 to scan the 
database 52 for likely responses 56 associated with the request 54 which provoked user response information 
so 53. The comparator 51 compares each likely response 56 from database 52 with the received information 53 
to determine which likely response 56 most closely corresponds to the received response 53. (Alternatively, 
the comparator 51 may tentatively identify the received response 53 as the closest likely response 56 and 
wait for some user interaction concerning a right of refusal; or, the comparator 51 may identify the received 
response 53, tentatively or otherwise, as the first likely response 56 associated with the request encountered 
55 in the database 52 with a measure of similarity within a range of acceptable similarity scores.) 
The comparator 51 outputs the determined likely response as the identified response 57. 
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A Speech Recognizer 

Figure 3 presents a connected-word speech recognizer as a further illustrative embodiment of the present 
invention. The recognizer 100 comprises input device 101 {e.g., a microphone of an I/O device), an anaiog-to- 

5 digital (A/D) converter 102, processor 103, and memory 104. Memory 104 stores, among other things, one or 
more lists of likely responses to a request for information associated with a given node 1 5. Also shown in Figure 
3 is a utilization device 105 to receive the response corresponding to the recognized speech. This utilization 
device 105 represents an information service. A bus 1 06 interconnects the A/D converter 102, the processor 
103, the memory 104, and the utilization device 105. The A/D converter 102, processor 103, memory 104, 

10 and utilization device 105 may be located locally to the input device 101. Alternatively, one or more of these 
may be located at some distance and coupled to the local devices by a network. 

Prior to considering the operation of the illustrative embodiment of Figure 3, it wll be instructive to consider 
the contents of memory 104 as they concern a list and associated word patterns for recognizing speech. 
The illustrative speech recognizer presented in Figure 3 exploits the idea that a request for information 

15 by an information service often provokes a spoken response which is predictable, for example, from past rec- 
ognized (or "decoded") speech of the user or from constraints on or the nature of the request for information. 
Via one or more techniques discussed below, a list of likely responses to a given request for information is 
determined and stored in memory 1 04. Each likely response in the list comprises a series of one or more ref- 
erences to word patterns (e.g., word templates or statistical models) stored separately in memory 104. Each 

20 word pattern represents a word used in a likely response. A multiple-word likely response therefore comprises 
references to multiple word patterns. 

Each word pattern stored in memory 104 comprises or is based on one or more speaker-independent or 
-dependent feature vectors. The feature vectors of a word pattern represent the salient spectral properties of 
the word in question. One type of feature vector comprises a mean of one or more spectral vectors, each of 

25 which is derived from a time-aligned slice (or "frame") of a sample (or "token") of given speech. For example, 
each feature vector may represent a 45 msec, frame of speech (i.e., a 45 msec, slice of a word), with adjacent 
frames separated by 1 5 msec, on center. Together, feature vectors for successive frames form a word pattern 
"template." Another type of feature vector includes a mean and covariance of a grouping of successive spectral 
vectors in a given token, determined over several tokens. Such means and covariances are used in statistical 

30 models of speech, such as the hidden Markov model (HMM) known in the art 

Feature vectors (for templates or statistical models) for a given word pattern may be obtained with any of 
several feature vector measurement techniques well known in the art, for example, Linear Predictive Coding. 
For a discussion of feature measurement techniques, see L.R. Rabiner and S.E. Levinson, Isolated and Con- 
nected Word Recognition - Theory and Selected Applications, Vol. Com-29, No. 5, 1.E.E.E. Transactions On 

35 Communications, 621-59 (May 1981); see also, LR. Rabiner and R.W. Schafer, Digital Processing of Speech 
Signals, 396-455 (1978). 

Illustrative data structures concerning list and word pattern storage in memory 104 are presented in Fig- 
ures 4a and 4b. As shown in Figure 4a, a list comprises V likely responses to a given request for information 
(such that the list is indexed by v, 1 ^ v ^ V). Each likely response (or list entry), R v , comprises a certain num- 

40 ber, L(v), of references to word patterns also stored in memory 1 04 (such that each likely response, R v , is in- 
dexed by/, 1 ^/^ L(v), and each R v (I) references a particular word pattern in memory 104). 

As shown in Figure 4b, word pattern storage comprises W word patterns (such that the storage is indexed 
by w, 1 ^ w ^ W) which are used in forming the responses of an associated list Each word pattern, P w com- 
prises a certain number, J(w), of feature vectors (such that each pattern, P w , is indexed by j, 1 ^ j ^ J (w)), 

45 and each P w (j) references a particular feature vector in a word pattern. 

A given response or list entry, R v , can therefore be represented as a sequence of feature vectors, Sv(m), 
the sequence determined by the sequence of word patterns, P w , specified by the response, R v , and the se- 
quence of feature vectors, P w (j)> forming each word pattern. Thus, a given response or list entry comprises 
M(v) feature vectors S v (m), 1 ^ m ^ M(v)). 

50 Figure 4c presents an exemplary sequence of feature vectors, Sy. The sequence presented, S 4 , is that 
specified by response or list entry R4, which references word patterns P 2 , P5, and P 4 , respectively, as shown 
in Figures 4a and 4c. Each of the referenced word patterns comprises feature vectors as specified in Figure 
4b. Figure 4c shows a sequence of 12 feature vectors (M(4) = 12) which make up the string, S 4 . 

The operation of the illustrative embodiment of Figure 3 may now be discussed with reference to Figure 

55 5. Figure 5 presents a how chart 200 showing an illustrative process executed by processor 1 03 of the recog- 
nizer 100. Responsive to receiving a START signal from the utilization device 105 over bus 106, processor 
103 begins its process by checking for the receipt of a digital version of unknown speech to be recognized (see 
Fig. 5, 210). Unknown speech is received by input device 101 and provided to the A/D converter 102 as analog 
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signal input, s(t). The AID converter 102 provides a digital signal version of the unknown speech, s(k). 

Once s(k) is available, the processor 1 03 performs spectral feature measurement processing on the digital 
signal, s(k), to produce a series of feature vectors, T(n), of received information. The received information fea- 
ture vectors are referred to as the "test pattern," where n indexes the individual feature vectors of the pattern. 
The feature vectors are obtained with the same technique as employed in generating the feature vectois of 
the word patterns stored in memory 1 04 (e.g., Linear Predictive Coding), and have the same frame duration 
and frame spacing. Feature vectors, T(n), are representative of salient spectral properties of the unknown 
speech signal, s(t). Thus, the test pattern may be categorized as received information. Test pattern feature 
vectors, T(n), are stored in memory 104 (see Fig. 5, 220). 

To recognize a test pattern of unknown speech, the processor 103 compares the test pattern to each of 
the V likely responses contained in the appropriate list for the request Each comparison takes into account 
the similarity of the feature vectors of the test pattern, T(n), and those feature vectors, Sy (m), formed by a 
series of one or more word patterns specified by a likely response in the list. The comparison is made by a 
technique known in the art as dynamic time alignment. 

Assuming the list contains one or more likely responses (see Fig. 5, 230), the processor 103 begins the 
time alignment process with the series of word patterns of the first likely response in the list, Ri (I), for I ^ 1 
^ L(1). (see Fig. 5, 235). Time alignment is performed between the test pattern feature vectors, T(n), and a 
sequence of feature vectors, S^m), formed by the series of word patterns specified by the first likely response, 
{see Fig. 5, 240; see also the Dynamic Time Alignment section below and Figure 6). A comparison score, 
D-,, indicating the simOarity or distance of the likely response with the test pattern is generated and saved (see 
Fig. 5, 245). The process is repeated for each of the likely responses in the list, R v , 2 ^ v ^ V. As a result, a 
set of'comparison scores, D v , 1 ^ v ^ V (see Fig. 5, 250) is determined. The list response which yields the 
best comparison score, D*. below a threshold is deemed to be the recognized response, R* (see Fig. 5, 255, 
260). 

The threshold value may be set arbitrarily or as part a training procedure for words in pattern storage. A 
typical value for the threshold corresponds to one standard deviation (1 <r) of word pattern or "token" compar- 
ison scores above a mean comparison score determined during a training process for word patterns stored in 
memory 104 (see discussion of training in List and Word Pattern Storage section below). 

If the comparison score, D*, is below the threshold (meaning that a good recognized response has been 
found), the recognized response, R*, is output to the utilization device (information service) 105. if desired, 
the comparison score, D*, may be output as well (see Fig. 5, 260 and 280). 

If the comparison score, D*, is not below the threshold, or if the list does not contain any likely responses, 
one or more back-up procedures are used to recognize the speech. A response corresponding to recognized 
speech from a back-up procedure is then output to the utilization device (information service) 105 (see Fig. 
5, 265, 270, 290). One back-up procedure which may be used comprises user manual entry of the information 
{see Fig. 5, 275). This may occur in response to a prompt of the user by the system via the I/O device. For a 
given embodiment, user manual entry may be the only back-up procedure needed. 

Whether speech is recognized by the list or recognized or supplied by a back-up procedure, the list and 
pattern storage may be updated to incorporate statistics of response usage or to expand the list (in the case 
of back-up speech recognition) such that succeeding iterations of the speech may be recognized without re- 
sorting to a back-up scheme (see Fig. 5, 295). Thus, a "new" response may be added to the list as a set of 
references to stored word patterns, and test pattern information may be used to provide additional training for 
word patterns in pattern storage. 

Also, as an option, an embodiment of the invention may provide the user with an opportunity to reject a 
recognized response. Under such circumstances, another automatic speech recognition process may be in- 
voked or manual entry of the equivalent of spoken words can be performed. 

Dynamic Time Alignment 

The dynamic time alignment referenced above and in Figure 5, 240, can be accomplished by any of the 
techniques well-known in the art. An exemplary technique for performing one form of dynamic time alignment, 
namely Dynamic Time Warping (DTW) based on word templates, is discussed with reference to Figure 6 which 
presents a grid of points in a coordinate system. A sequence of feature vectors which make up the test pattern, 
T(n), is mapped to the abscissa (the independent variable) (see t e.g., Figure 4c) and a sequence of feature 
vectors, S v (m), which make up a likely response, R*, is mapped to the ordinate (the dependent variable). Each 
point in the grid represents the similarity or correspondence between the n th feature vector T(n) of the test 
pattern and the m th feature vector S v (m) of the sequence of vectors of the likely response, R v . A measure of 
similarity may be obtained according to the itakura log likelihood ratio, as described in the article by F. Itakura 
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entitled, "Minimum Prediction Residual Principle Applied to Speech Recognition", /.££.£ Transaction on 
Acoustics, Speech, and Signal Processing, Vol. ASSP-23, No. 1, pages 67-72, February, 1975: 

dfTM.Rv (m» = tagTTM-S, (m)] (1) 
/.e., a log of the dot product of the two vectors T(n) and Sy(m). 

5 The quantity d is referred to as the "local distance" because the magnitude of d increases as the corre- 

spondence between T(n) and Sv(m) decreases (of course, other measures of similarity may be used, such as 
correlation coefficients which increase as the correspondence between T(n) and Sv(m) increases). 

Since test pattern feature vector index n is defined to be the independent variable, the likely response fea- 
ture vector index m may be written equivalently as a function of n, that is, 

10 m = w{n), (2) 

where w(n) represents a path through the grid as shown in Figure 6. The local distance, d, of equation (1) may 
therefore be written as d(T(n), S* (w(n)). 

In order to optimally align the test pattern feature vectors with the sequence of feature vectors of a likely 
response, the sum of the local distance signals d(T(n), Sy(w(n))) between the feature vectors of test pattern, 

15 T(n) and the likely response, S^wfn)), is minimized: 

1 N 

D v = min— £ d(T(n), S v (w(n))). (3) 

20 

The quantity D v is the comparison score (or global average distance) for a likely response, R*. The likely re- 
sponse, R v , 1 ^ v ^ V, which yields the minimum comparison score, D*, is the best candidate for identifying 
the input test pattern, T(n). 

25 In order to obtain a given comparison score, D v , certain assumptions are made. First, is assumed that the 
beginning and ending frames of both the input and reference words have been accurately determined. The 
first input frame n=1 is thus paired with the first reference frame m=1, on 

w(1) = 1. (4) 

Similarly, the last input frame n=N is paired with the last reference frame m=M: 
30 w(N) = M. (5) 

It is also assumed that the Itakura path constraints are obeyed: 

0^w(n) - w(n - 1)^2, (6) 

and 

w(n) = w(n - 1)*0 if w(n - 1) - w(n - 2) = 0. (7) 
35 These local path constraints guarantee that the average slope of the warping function w(n) lies between 
1/2 and 2, and that the path is monotonic non-decreasing. In other words, the local path constraints define 
acoustically reasonable and allowable paths. 

The preceding endpoint and local path constraints may be summarized by a set of global path constraints: 

m L (n)^m^m H (n) (8) 

40 where 



45 



and 



m L (n) = min[2(n - 1) + 1, M - ^(N - n), M] (9) 



m H (n) = max[|(n - 1) + 1, M - 2(N - n), 1] (10) 



The global path constraints define the parallelogram (or window) shown in Figure 6. Allowable paths in- 
clude only points within the parallelogram. 

The path w(n) yielding a minimum distance or comparison score, D v , can be found by a dynamic program- 
ming process. An accumulated distance, D A , at any given pair of frames n and m is defined to be the sum of 
so the local distances d from point (1,1) to and including the present point (n t m) f along the minimum distance or 
"best" path between points (1,1) and (n,m). Accumulated distance D A may be generated recursively from point 
(1 ,1) to point (N,M) according to the following equation: 

D A (n,m) = d(T(n),S(m)) + min[D A (n- 1,m)g(n- 1,m) - D A (n - 1,m - 1),D A (n - 1,m- 2)], (11) 
where the constraints are 
55 1^n^N, m L (n)^m^m H (n) (12) 

and where g(n,m) is a nonlinear weighting 
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g(n,m) = 1, if w(n)*w(n-l) 

= oo, if w(n) = w(n-l) , (13) 

to guarantee that the optimum path to (n,m) does not stay flat for two consecutive frames. The desired com- 
parison score, D v , for a likely response, Rv, is thereby equal to the accumulated distance D A (N, M). 

This procedure may be performed for each likely response, Rv, providing values for Dy, 1 =i v ^ V. The test 
pattern, T(n}, can be recognized as the likely response, Ry, with the minimum comparison score, D*. smaller 
than a threshold for "good" scores. 

List and Word Pattern Storage 

As discussed above, the illustrative speech recognizer embodiment employs list and pattern storage in 
recognizing likely spoken responses to a requestfor information. The list comprises one or more likely respons- 
es, each comprising a string of one or more references to stored word patterns. Each word pattern referenced 
by a likely response comprises or is based upon one or more feature vectors derived from either speaker-in- 
dependent or -dependent data (that is, based on speech tokens from multiple people or a single person, re- 
spectively). The contents of list and pattern storage may be determined from knowledge of likely user respons- 
es, from experience (i.e., training) with a user, or both. 

Knowledge of likely user responses is often derived from the associated requestfor information. Thus, list 
responses and word patterns may be determined based upon the nature of the request (e.g., determined based 
upon the type of information sought) or the constraints placed on a response by the terms of the request (e.g., 
by choices given to a service user from which to select as a response). For example, if a request were to ask 
a user to specify a color, the nature of the request would suggest a list which included the responses "red," 
"blue," "orange," etc., and supporting patterns. On the other hand, if a request to specify a color included a 
menu of alternatives - "red," "green," or "yellow" - then these choices should be in the list as likely responses 
with supporting patterns provided. 

Knowledge of likely responses and associated patterns may also be obtained from the nature of the in- 
formation service itself. For example, if an information service is concerned with taking orders for automobile 
parts, such words as "sparkplug," muffler," "headlight," and f flter," among others, might be provided by a list 
and pattern storage. 

A list of likely responses and supporting patterns may be provided through experience or training ("train- 
ing") with a user. Such training generally requires either manual user action, or the use of other speech rec- 
ognition techniques well-known in the art, such as a Vector Quantization Codebook scheme (see Unde, Buzo, 
and Gray, An Algorithm for Vector Quantization Design, Vol. Com-28, No. 1, 1.E.E.E. Transactions on Com- 
munications, 84-95 (Jan. 1 980)) or the "level building" technique of Myers and Rabiner (see Myers and Rabiner, 
A Dynamic Time Warping Algorithm for Connected Word Recognition, Vol. ASSP-29, 1.E.E.E. Trans. Acoust, 
Speech, Signal Processing, 284-97 (Apr. 1981 )). Training may be performed prior to recognizer use as part 
of a training mode, or during use in the form of one or more back-up procedures. Moreover, training provided 
by speech recognition techniques may be performed locally or off-line and provided to the system via, e.g., a 
read-only memory. 

Manually provided training may require a user to provide data equivalent to a spoken response through 
the use of an I/O device, such as a keyboard. This data is used to update the stored list Manual training may 
also involve creating or updating patterns in word pattern storage by requiring a user(s) to speak samples (or 
tokens) of words one or more times. These samples, once processed by a feature measurement technique, 
are used, e.g., to form one or more mean spectral vectors (i.e., one or more feature vectors) for a word pattern. 
Each word pattern, Pw, is stored in pattern storage as a word to be referenced by list responses, R v . 

If a speech recognition scheme is used to provide training, the output of such a scheme may serve to aug- 
ment the list and update word pattern storage. The list may be updated by including a newly recognized re- 
sponse as a likely response, Ry Pattern storage may be updated by including recognized test pattern infor- 
mation in the computation of, e.g., mean spectral vectors for word patterns. 

Whether through knowledge or training, the determination of one or more likely responses reflects a priori 
probabilities that a given request will provoke such responses, if probable responses are known prior to infor- 
mation service use, then these probable responses can be provided to a list with supporting pattern storage. 
Regardless of whether any such responses are known prior to use, those likely responses determined through 
training (either during a training mode or with use) may augment the list and update pattern storage. 
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Referring to Figure 5, the selection of patterns of a response (see 235) is directed, at least initially, to those 
responses considered likely prior to training. However, if no responses are considered likely prior to training 
(see 230), or if the list of likely responses fails to produce a recognized response with a comparison score below 
the threshold for good recognized responses (see 260), one or more alternate procedures may be employed 
5 to perform speech recognition and provide the recognized speech to update the list and pattern storage {see 
265, 270, 275, 295). 
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A Connected-Digit Repertory Dialer 

A further illustrative embodiment of the present invention concerns a connected-digit speech recognizer 
for a telephone repertory dialer. With this embodiment, a user speaks a telephone number in a connected- 
digit fashion (/.e., fluently) in response to an explicit or implicit request and the speech is recognized and pro- 
vided to an automatic dialer. 

In this embodiment, a list is stored which comprises telephone numbers which are likely to be dialed by a 
15 user. Likely numbers comprise numbers which either are or will be frequently dialed. Each digit or group of 
digits of a likely number references a sequence of feature vectors in pattern storage. 

A list of likely numbers may be built in any number of ways. For example, the list may be built manually 
through user entry of the likely numbers directly from a telephone keypad, either as part of a special mode 
providing for such entry (under the control of the processor), or as part of a back-up procedure when no list 
20 entry for the number exists. Also, the list may be built automatically by observation of normal telephone usage, 
either locally (i.e., at the telephone itself) or by a node(s) in a network to which the telephone is connected. 
Whether built manually or automatically, locally or by a network, the list containing likely telephone numbers 
may be stored locally or at an external network location. 

The pattern storage comprises speaker-independent feature vectors for the words corresponding to the 
25 ten digits, zero through nine, and the usual associated words, such as "oh/ "hundred," and "thousand. " In 
addition, the pattern storage may include patterns for one or more user command words, such as "off-hook," 
"dial," "hang-up," "yes," "no," etc. 

Pattern storage may also include patterns for one or more names of people, businesses or services likely 
to be called; that is, the names associated with likely numbers in the list. In this way, a number may be dialed 
30 by t he illustrative embodiment either as a result of a user speaking digits or by speaking the name of the person, 
business or service to be called. A representation of a telephone number in a list may therefore relate to the 
number itself, an associated name, or both (in which case an association in list memory between number and 
name representations would be established). Telephone number information received from a user to be rec- 
ognized may comprise a number or an associated name. 
35 The illustrative embodiment of a connected-digit speech recognizer 300 for a telephone repertory dialer 
is presented in Figure 7. Telephone 301 serves as an I/O device used for entry of speech to be recognized. 
The telephone 301 comprises an automatic dialer which requires input of a telephone number from the speech 
recognizer 300. Thus, in this embodiment, the telephone 301 serves as the utilization device referenced in 
Figure 3. The telephone 301 is coupled to an analog-to-digital (AID) and digital-to-analog (D/A) converter 302. 
40 The telephone 3011s also coupled to a processor 303 and memory 304 by bus 305. The A/D and D/A converter 
302 is also coupled to bus 305, and thereby coupled to the processor 303 and memory 304. Processor 303 
comprises a feature measurement processor and a dynamic time alignment processor. For a given illustrative 
embodiment, processor 303 may further comprise a back-up speech recognition processor, such as a VQC 
recognition processor. 

45 The operation of the illustrative embodiment of Figure 7 is presented in the flow-chart 400 of Figure 8. 

Upon receipt of a START command from telephone 30 1 , the processor 303 waits to receive a digitized version 
of a spoken telephone number to be dialed (see Fig. 8, 41 0). Contemporaneously, a spoken telephone number 
is received by the telephone 301 and provided to the A/D converter 302 which, in turn, provides the digitized 
version of the spoken number, s(k), to the processor 303. Responsive to receipt of s(k), the processor 303 

so performs feature measurement on s(k) to produce a series of feature vectors, T(n) (see Fig. 8, 420) for storage 
in memory 304. Assuming the list contains one or more likely telephone numbers, (see Fig. 8, 430), DTW of 
T(n) is performed with each number, Ry, in the list and a comparison score, D v , is kept for each DTW performed 
(see Fig. 8, 435, 440, 445, 450). 

The best comparison score, D*, from ail comparison scores for the list is determined (see Fig. 8, 455) and, 

55 if it is below a threshold (see Fig. 8, 460), the list entry corresponding to the best score, R*, is deemed to be 
the telephone number spoken. Therefore, the number, R* p is provided to the telephone 301 via bus 305 for 
dialing. 

If the best score, D*, is not below the threshold, or if the list contained no entries of likely numbers to be 
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dialed, alternative or back-up techniques for speech recognition are performed. For purposes of this illustrative 
embodiment, a first technique comprises Vector Quantization Codebook (VQC) recognition (see Fig. 8, 466). 
VQC recognition techniques are well known in the art. See, Pan, Soong and Rabiner, A Vector-Quantization- 
Based Preprocessor for Speaker-Independent Isolated Word Recognition, Vol. ASSP-33, No. 3, 1.E.E.E. Trans- 
5 actions on Acoust , Speech, and Signal Processing, 546-60 (June 1 985); see also U.S. Patent No. 4,860,385, 
which is hereby incorporated by reference as if set forth fully herein; see also Shore and Burton, Discrete Ut- 
terance Speech Recognition Without Time Alignment, Vol. lT-29, No. 4, 1.E.E.E. Transactions on Information 
Theory, 473-91 (July 1980). 

If the VQC recognition is successful (see Fig. 8, 470), the recognized telephone number is provided to the 
10 telephone 301 for dialing (see Fig. 8, 490). 

If the VQC recognizer fails to recognize the spoken number (see Fig. 8, 470), then the user is prompted 
by this embodiment to dial the number manually (see Fig. 8, 475) with telephone 301. 

As it concerns any speech recognition task (/.a, telephone numbers or commands), this illustrative em- 
bodiment may also provide a user with an opportunity to reject recognized speech. Under such circumstances, 
15 another technique (e.g., a back-up technique) or manual entry may be employed. 

Regardless of how the number is dialed, information concerning the dialed number is used to update the 
list (see Fig. 8, 500). The update to the list may involve storage of a telephone number not previously stored 
therein such thatfijture attempts at dialing the number may be recognized without resorting to a back-up pro- 
cedure. It may also involve using test pattern information to update the training of feature vectors for word pat- 
20 terns. It may further involve storing information concerning the usage of the telephone number by the user, 
such as the number of times the telephone number has been dialed or the date of last dialing. Such usage 
information may be employed in a likely response comparison scheme wherein likely responses are arranged 
in order of likelihood and a received response is identified, tentatively or otherwise, as the first encountered 
response which yields an acceptable comparison score. Such usage information may be also used as a basis 
|5 25 for dropping or replacing a number previously stored in the list (e.g., if storage space is limited). 

1;'% Just as telephone numbers to be dialed may be recognized through storage in the list, so may command 

|J words which control overall recognizer function. So, for example, speaker-independent vector patterns for 

Q words such as "off- hook," "dial," "hang-up," "yes," "no," etc., may be included in pattern storage and referenced 

W in the list to provide hands-free operation of a telephone incorporating this embodiment of the present inven- 

30 tion. in this embodiment, the voice command "dial" may be recognized and used to prompt the processing of 
Iff a spoken telephone number through the issuance of a START command. 



Claims 
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1. A method for resolving uncertainty in information provided to an information service, the method com- 
prising the steps of: 

storing in a database a list of one or more likely responses to a request for information; 
receiving information from a user of the information service in response to the request for infbr- 
40 mation; and 

comparing received information with one or more of the likely responses in the list to identify such 
received information. 



45 



so 



2. The method of claim 1 wherein the step of storing a list of one or more likely responses in a database 
comprises the step of determining a likely response based on an a priori probability that a response will 
be provoked by the request. 

3, The method of claim 2 wherein the step of determining a likely response comprises the step of determining 
an a priori probability based on training with a user. 

4, The method of claim 3 wherein training is provided by a back-up procedure for resolving uncertainty. 

5. The method of claim 2 wherein the step of determining a likely response comprises the step of determining 
an a priori probability based on the nature of the information service. 

55 6. The method of claim 2 wherein the step of determining a likely response comprises the step of determining 
an a priori probability based on the nature of the request for information. 

7. The method of claim 2 wherein the step of determining a I ikely response comprises the step of determining 
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an a priori probability based on constraints placed on responses by the request for information. 

8. Themethodof claim 1 wherein the stepof comparing received information to one or more likely responses 
comprises the step of determining a comparison score for a likely response. 

9. The method of claim 8 wherein the step of determining a comparison score comprises the step of deter- 
mining whether a comparison score is within a range of acceptable comparison scores to identify received 
information. 

10. The method of claim 9 further comprising the step of performing a back-up uncertainty resolution tech- 
nique when no comparison score is within the range of acceptable comparison scores. 

11. The method of claim 10 further comprising the step of updating the list of stored likely responses with 
results of the back-up uncertainty resolution technique. 

*5 12. The method of claim 1 further comprising performing a back- up uncertainty resolution technique when 
two or more comparison scores are within the range of acceptable comparison scores. 

13. The method of claim 1 further comprising the step of maintaining likely response usage statistics based 
on identified received information. 

14. The method of claim 1 further comprising the step of the user rejecting an identification of received in- 
formation. 

15. The method of claim 14 further comprising the step of performing a back-up uncertainty resolution tech- 
25 nique. 

16. A method for speech recognition comprising the steps of: 

storing in a database a list of one or more representations of likely spoken responses to a request 
for information; 

receiving speech information from a user in response to the request; and 
30 comparing received speech information to one or more representations of likely responses'in the 

list to recognize such received speech information. 

17. The method of claim 16 wherein the likely spoken responses comprise telephone numbers. 

35 18. The method of claim 1 7 further comprising the step of dialing the likely telephone number corresponding 
to recognized speech information. 

19. The method of claim 16 wherein the step of storing a list of one or more representations of likely spoken 
responses comprises the steps of: 

40 storing one or more word patterns comprising one or more feature vectors; and 

storing one or more references to stored word patterns as a representation of a likely spoken re- 
sponse. 

20. The method of claim 19 wherein the step of storing one or more word patterns comprises the step of de- 
45 termining such word patterns with a feature measurement technique. 

21. The method of claim 20 wherein the feature measurement technique comprises linear predictive coding. 

22. The method of claim 19 further comprising the step of updating a stored word pattern with recognized 
received speech information. 

so 

23. The method of claim 1 6 wherein the step of storing in a database a list of one or more representations of 
likely spoken responses comprises the step of determining a likely spoken response based on an a priori 
probability that a spoken response will be provoked by the request. 

55 24. The method of claim 23 wherein the step of determining a likely spoken response comprises the step of 
determining an a priori probability based on training with a user. 

25. The method of claim 24 wherein training is provided by a back-up procedure for recognizing speech. 
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26. The method of claim 25 wherein the back-up procedure comprises vector quantization codebook speech 
recognition on received speech information. 

27. The method of claim 25 wherein the back-up procedure comprises the user supplying an equivalent to 
the received speech information with use of an input device. 

28. The method of claim 1 6 wherein the step of receiving speech information comprises the step of producing 
a test pattern of received information by a feature measurement technique. 

29. The method of claim 28 wherein the feature measurement technique comprises linear predictive coding. 

30. The method of claim 1 6 wherein the step of comprising received speech information to one or more likely 
spoken responses comprises the step of determining a comparison score for a likely spoken response. 

31. The method of claim 30 wherein the step of determining a comparison score for a likely response com- 
prises the step of performing dynamic time alignment between received speech information and a likely 
spoken response. 

32. The method of claim 31 wherein the step of performing dynamic rime alignment between received speech 
information and a likely spoken response comprises the step of performing dynamic time warping. 

33. The method of claim 30 wherein the step of determining a comparison score comprises the step of de- 
termining whether a comparison score is within a range of acceptable comparison scores to recognize 
received speech information. 

34. The method of claim 33 further comprising the step of performing a back-up speech recognition technique 
when no comparison score is within the range of acceptable comparison scores. 

35. The method of claim 34 further comprising the step of updating the list of stored representations of likely 
spoken responses with results of the back-up speech recognition technique. 

36. The method of darn 16 further comprising the step of maintaining likely response usage statistics based 
on recognized received speech information. 

37. The method of claim 16 further comprising the steps of: 

the user rejecting a recognition of received speech information; and 
performing a back-up speech recognition technique. 

38. The method of claim 37 further comprising the step of updating the list of stored representations of likely 
spoken responses with results of the back-up speech recognition technique. 

39. The method of claim 1 6 further comprising the step of updating the list of stored representations of likely 
spoken responses with recognized received speech information. 

40. The method of claim 16 wherein the step of comparing comprises the steps of: 

comparing received speech information to each stored representation of a likely spoken response; 

and 

determining the likely spoken response whose representation most closely compares to received 
speech information. 

41. An apparatus for resolving uncertainty in information received from an input device to be provided to an 
information service, the information received in response to a requestfor information, the apparatus com- 
prising: 

a database storing one or more responses to the request for information based on a priori prob- 
abilities that such responses will be provoked by the request; and 

a comparator, coupled to the database and the input device, for comparing received information 
with one or more responses in the list to identify such received information. 

42. A system for recognizing spoken telephone number information, the telephone number information re- 
ceived from an input device, the system comprising: 
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a database for storing one or more representations of telephone numbers likely to be spoken; and 
a comparator, coupled to the database and the input device, for comparing spoken telephone num- 
ber information with one or more representations of stored telephone numbers to recognize such infor- 
mation as a stored representation of a telephone number. 

43. The system of claim 42 further comprising an automatic dialer, coupled to the comparator, for dialing the 
telephone number associated with the recognized information. 

44. The system of claim 43 wherein the coupling between the automatic dialer and the comparator comprises 
a network. 

45. The system of claim 42 wherein the comparator comprises a feature measurement processor, coupled to 
the input device, for performing feature measurements on the spoken telephone number information. 

46. The system of claim 45 wherein the comparator further comprises a dynamic rime alignment processor, 
15 coupled to the database and the feature measurement processor, for performing dynamic time alignment 

between feature measurements of the spoken telephone number information and one or more represen- 
tations of stored telephone numbers. 

47. The system of claim 42 wherein the database storing one or more representations of telephone numbers 
20 comprises: 

one or more word patterns comprising one or more feature vectors; and 
one or more references to stored word patterns as a representation of a likely spoken telephone 
number. 

25 48. The system of claim 42 further comprising a back-up speech recognizer for recognizing spoken telephone 
number information when the comparator does not recognize such information. 

49. The system of claim 48 wherein the back-up speech recognizer comprises a vector quantization codebook 
recognizer. 

50. The system of claim 42 wherein the coupling between the database and the comparator comprises a net- 
work. 

51. The system of claim 42 wherein the coupling between the input device and the comparator comprises a 
network. 

52. A database for use with a speech recognition system coupled thereto, the database comprising one or 
more likely responses to a requestfor information, each such likely response having associated therewith 
an a priori probability that the response will be provoked by the request 

^0 53. The database of claim 52 wherein the coupling of the database and the speech recognition system com- 
prises a network. 
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Abstract 

In a telecommunication system, automatic directory 
assistance uses a voice processing unit comprising a database of 
vocabulary ; items and data representing a predetermined 
relationship between each vocabulary item and a calling number 
in a location served by the automated directory assistance 
apparatus* The voice processing unit issues messages to a caller 
making a directory assistance call to prompt the caller to utter 
a required one of said vocabulary items. The unit detects a 
calling number originating a directory assistance call and, 
responsive to the calling number and the relationship data 
computes a probability index representing the likelihood of a 
vocabulary item being the subject of the directory assistance 
call. The unit employs a speech recognizer to recognize, on the 
basis of the acoustics of the caller's utterance and the 
probability index, a vocabulary item corresponding to that 
uttered by the caller. 
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METHOD AND APPARATUS FOR AUTOMATION OF DIRECTORY ASSISTANCE USING 
SPEECH RECOGNITION. 

The invention relates to a method and apparatus for 
providing directory assistance, at least partially automatically, 
to telephone subscribers. 

In known telephone systems, a telephone subscriber requiring 
directory assistance will dial a predetermined telephone number. 
In North America, the number will typically be 411 or 555 1212 . 
When a customer makes such a directory assistance call, the 
switch routes the call to the first available Directory 
Assistance (DA) operator . When the call arrives at the 
operator's position, an initial search screen at the operator's 
terminal will be updated with information supplied by the switch, 
Directory Assistance Software (DAS), and the Operator Position 
Controller (TPC) * The switch supplies the calling number and the 
DBMS call identifier, the DAS supplies the default locality and 
zone, and the TPC supplies the default language indicator. While 
the initial search screen is being updated, the switch will 
connect the subscriber to the operator. 

When the operator hears the "customer-connected 11 tone, the 
operator will proceed to complete the call. The operator will 
prompt for locality and listing name before searching the 
database. When a unique listing name is found, the operator will 
release the customer to the Audio Response Unit (ARU) , which will 
play the number to the subscriber. 

Telephone companies handle billions of directory assistance 
calls per year, so it is desirable to reduce labour costs by 
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minimizing the time for which a directory assistance operator is 
involved. As described in U.S. patent No. 5,014,303 (Velius) 
issued May 7 , 1991, the entire disclosure of which is 
incorporated herein by reference, a reduction can be achieved by 
directing each directory assistance call initially to one of a 
plurality of speech processing systems which would elicit the 
initial directory assistance request from the subscriber. The 
speech processing system would compress the subscriber's spoken 
request and store it until an operator position became available, 
whereupon the speech processing system would replay the request 
to the operator. The compression would allow the request to be 
replayed to the operator in less time than the subscriber took 
to utter it. 

Velius mentions that automatic speech recognition also could 
be employed to reduce the operator work time. In a paper 
entitled "Multiple-Level Evaluation of Speech Recognition 
Systems....", the entire disclosure of which is incorporated 
herein by reference, John P. Pitrelli et al discloses a partially 
automated directory assistance system in which speech recognition 
is used to extract a target word, for example a city name, from 
a longer utterance. The system strips off everything around the 
target word so that only the target word is played back to the 
operator. The operator initiates further action. 

US patent No. 4,797,910 (Daudelin) issued January 10, 1989, 
the entire disclosure of which is incorporated herein by 
reference, discloses a method and apparatus in which operator 
involvement is reduced by means of a speech recognition system 
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which recognizes spoken commands to determine the class of call 
and hence the operator to which it should be directed. The 
savings to be achieved by use of Daudelin's speech recognition 
system are relatively limited, however, since it is not capable 
5 of recognizing anything more than a few commands, such as 
••collect*, ••calling card*, operator", and so on. 

These known systems can reduce the time spent by a directory 
assistance operator in dealing with directory assistance call, 
but only to a very limited extent. 
10 An object of the present invention is to provide an improved 

automated directory assistance system capable- of reducing, or 
even eliminating, operator involvement in directory assistance 
calls. To this end, in preferred embodiments of the present 
% invention a speech recognition system elicits a series of 

%Q 15 utterances by a subscriber and, in dependence upon a listing name 
being recognized, initiates automatic accessing of a database to 
W determine a corresponding telephone number. 

|n The system may be arranged to transfer or "deflect" a 

directory assistance call to another region when it recognizes 
fQ 20 that the subscriber has uttered the name of a location which is 
^ outside its directory area. 

O Preferably, the system accesses the database taking account 

of a priori call distribution. A priori call distribution 
weights the speech recognition decision to take account of a 
25 predetermined likelihood that a particular destination will be 
sought by a caller, conveniently based upon the caller's number. 
According to one aspect of the invention, automated 
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directory assistance apparatus for at least partially automating 
directory assistance in a telephone system comprises a voice 
processing unit comprising a database of vocabulary items and 
data representing a predetermined relationship between each 
5 vocabulary item and a calling number in a location served by the 
automated directory assistance apparatus, means for issuing 
messages to a caller making a directory assistance call to prompt 
the caller to utter a required one of said vocabulary items, 
means for detecting a calling number originating a directory 

10 assistance call, means responsive to a caller identifier, for 
example the calling number, and said data for computing a 
probability index representing the likelihood of a vocabulary 
item being the subject of the directory assistance call, and 
speech recognition means for recognizing, on the basis of the 

15 acoustics of the caller's utterance and the probability index, 
a vocabulary item corresponding to that uttered by the caller. 

Embodiments of the invention may comprises means for 
prompting a subscriber to specify a location, means for detecting 
a place name uttered in response, means for comparing the uttered 

20 place name with a database and independence upon the results of 
the comparison selecting a message, playing the message to the 
subscriber. If the place name has been identified precisely as 
a city or location name, the message may be an NPA. 
Alternatively the message could be to the effect that the 

25 number is in a different calling or directory area and offer to 
give the subscriber the area code. in that case, the speech 
recognition system would be capable of detecting a positive 

4 
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answer and supplying the appropriate area code from the data 
base. Another variation is that the customer could be asked if 
the call should be transferred to the directory assistance in the 
appropriate area. If the subscriber answered in the affirmative, 
the system would initiate the call transfer. 

As mentioned the recognition system preferably makes its 
choice based upon a predetermined probability index derived using 
an identifier such as calling number. The probability index will 
bias the selection in favour of, for example, addresses in the 
same geographical area, such as the same city. 

The probability index need not be geographical, but might 
be temporal, perhaps according to time-of-day, or week or year. 
For example, certain businesses, such as banks, are unlikely to 
be called at one o'clock in the morning whereas taxi firms are. 
Likewise, people might call a ski resort in winter but not in 
summer. Hence the nature of the business can be used to weight 
the selection of certain portions or segments of the data base 
for a particular enquiry. 

The discourse between the speech recognition system and the 
subscriber may be recorded. If the system disposes of the call 
entirely without the assistance of the operator, the recording 
could be erased immediately. On the other hand, if the call 
cannot be handed entirely automatically, at the point at which 
the call is handed over to the operator, the recording of the 
entire discourse, or at least the subscriber's utterances, could 
be played back to the operator. Of course, the recording could 
be compressed using the prior art techniques mentioned above. 
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According to a second aspect of the invention, a method of 
at least partially automating directory assistance in a telephone 
system comprises a voice processing unit comprising a database 
of vocabulary items and data representing a predetermined 
relationship between each vocabulary item and a calling number 
in a location served by the automated directory assistance 
apparatus, comprises the steps of issuing messages to a caller 
making a directory assistance call to prompt the caller to utter 
a required one of said vocabulary items, detecting a calling 
number originating a directory assistance call, computing, in 
response to. the calling number and said data, a probability index 
representing the likelihood of a vocabulary item being the 
subject of the directory assistance call, and employing speech 
recognition means to recognize, on the basis of the acoustics of 
the caller's utterance and the probability index, a vocabulary 
item corresponding to that uttered by the caller. 

An embodiment of the invention will now be described by way 
of example only and with reference to the accompanying drawings 
in which: 

Figure 1 is a general block diagram of a known 
telecommunications system; 

Figure 2 is a simplified block diagram of parts of a 
telecommunications system employing an embodiment of the present 
invention; 

Figures 3A and 3B are a general flow chart illustrating the 
processing of a directory assistance call in the system of Figure 
2; 
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Figure 4 is a chart illustrates the frequency with which 
certain cities are requested by callers the same or other cities; 
and 

Figure 5 is a graph of call distribution according to 
distance and normalized for population of the called city. 

Figure 1 is a block diagram of a telecommunications system 
as described in US patent number 4,797,910. As described 
therein, block 1 is a telecommunications switch operating under 
stored program control. Control 10 is a distributed control 
system operating under the control of a group of data and call 
processing programs to control various parts of the switch. 
Block 12 is a voice and data switching network capable of 
switching voice and/or data between inputs connected to the 
switching network. An automatic voice processing unit 14 is 
connected to the switching network 12 and controlled by control 
10. The automated voice processing unit receives input signals 
which may be either voice or dual tone multifrequency {DTMF) 
signals and is capable of determining whether or not the DTMF 
signals are allowable DTMF signals and initiating action 
appropriately. in the system described in US patent number 
4,797,910, the voice processing unit has the capability to 
distinguish among the various elements of a predetermined list 
of spoken responses. The voice processing unit 14 also has the 
capability to generate tones and voice messages to prompt a 
customer to speak or key information into the system for 
subsequent recognition by the voice recognition unit. In 
addition, the voice processing unit 14 is capable of recording 
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a short customer response for subsequent playback to a called 
terminal. The voice processing unit 14 generates an output data 
signal, representing the result of the voice processing. This 
output data signal is sent to control 10 and used as an input to 
5 the program for controlling establishment of connections in 
switching network 12 and for generating displays for operator 
position 24. In order to set up operator assistance calls, 
switch 1 uses two types of database system. Local data base 16 
is directly accessible by control 10 via switching network 12. 

10 Remote data base system 20 is accessible to control 10 via 
switching network 12 and interconnecting data network 18. A 
remote data base system is typically used for storing data that 
is shared by many switches. For example, a remote data base 
system might store data pertaining to customers for a region; the 

15 particular remote data base system that is accessed via data 
network 18 would be selected to be the remote data base 
associated with the region of the called terminal. 
Interconnecting data network 18 can be any well known data 
network and specifically could be a common channel signalling 

20 system such as the international standard telecommunications 
signalling system CCS 7. 

Transaction recorder 22 is used for recording data about 
calls for subsequent processing. Typically, such data is billing 
data. The transaction recorder 22 is also used for recording 

25 traffic data in order to engineer additions properly and in order 
to control traffic dynamically. 
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The present invention will be employed in a 
telecommunications system which is generally similar to that 
described in US patent number 4,797,910* Figure 2 is a 
simplified block diagram of parts of the system involved in a 
5 directory assistance call, corresponding parts having the same 
reference numbers in both Figure 1 and Figure 2. As shown in 
Figure 2, block 1 represents a telecommunications switch 
operating under stored program control provided by a distributed 
control system operating under the control of a group of data and 

10 call processing programs to control various parts of the switch* 
The switch 1 comprises a voice and data switching network 12 
capable of switching voice and/or data between inputs and outputs 
of the switching network* As an example, Figure 1 shows a trunk 
circuit 31 connected to an input of the network 12. A caller's 

15 station apparatus or terminal 40 is connected to the trunk 
circuit 31 by way of network routing/switching circuitry 30 and 
an end office 33. The directory number of the calling terminal, 
identified, for example, by automatic number identification, is 
transmitted from the end office switch 33 connecting the calling 

20 terminal 40 to switch l. 

An operator position controller 23 connects a plurality of 
operator positions 24 to the switch network 12. A data/ voice 
link 27 connects an automated voice processing unit 14A to the 
switching network 12 . The automated voice processing unit 14A 

25 will be similar to that described in US patent number 4,797,910 
in that it is capable of generating tones and voice messages to 
prompt a customer to speak or key dual tone multifrequency (DTMF) 
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information into the system, determine whether or not the DTMF 
signals are allowable DTMF signals, and initiating action 
appropriately and to apply speech recognition to spoken inputs. 
In addition, the voice recognition unit 14A is capable of 
recording a short customer response for subsequent playback to 
a human operator. Whereas in US patent number 4,797,910, 
however, the voice processing unit 14 merely has the capability 
to distinguish among various elements of a very limited list of 
spoken responses to determine the class of the call and to which 
operator it should be directed, voice processing unit 14A of 
Figure 2 is augmented with software enabling it to handle a major 
part, and in some cases all, of a directory assistance call. 

Each operator position 24 comprises a terminal which is used 
by an operator to control operator assistance calls. Data 
displays for the terminal are generated by operator position 
controller 23. 

In order to provide the enhanced capabilities needed to 
automate directory assistance calls, at least partially, the 
voice processing unit 14A will employ flexible voice recognition 
technology and a priori probabilities. For details of a suitable 
flexible voice recognition system the reader is directed to 
Canadian patent application number 2,069,675 filed May 27, 1992, 
the entire disclosure of which is incorporated herein by 
reference. A , pripri probability uses the calling number to 
determine a probability index which will be used to weight the 
speech recognition based upon the phonetics of the caller's 
utterances. The manner in which the a priori probabilities are 
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determined will be described in more detail later with reference 
to Figures 4 and 5. 

As shown in Figure 2, in embodiments of the present 
invention, when the voice processing unit 14 receives a directory 
assistance call, it determines in step 301 whether or not the 
number of the calling party is included. If it is not, the voice 
processing unit immediately redirects the call for handling by 
a human operator in step 302* If the calling number is included, 
in step 303 the voice processing unit issues a bilingual greeting 
message to prompt the caller for the preferred language. At the 
same time, the message may let the caller know that the service 
is automated, which may help to set the caller in the right frame 
of mind* Identification of language choice at the outset 
determines the language to be used throughout the subsequent 
process, eliminating the need for bilingual prompts throughout 
the discourse and allowing the use of less complexity in the 
speech recognition system. 

If no language is selected, or the answer is unrecognizable, 
the voice processing unit 14 hands off the call to a human 
operator in step 304 and plays back to the operator whatever 
response the caller made in answer to the prompt for language 
selection. It will be appreciated that the voice processing unit 
14 records at least the caller's utterances for subsequent 
playback to the operator, as required* 

If the caller selects French or English, in step 305 the 
voice processing unit 14 uses the calling number to set a priori 
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probabilities to determine the likelihood of certain locality 
names being requested* The voice processing unit has a basic 
vocabulary of localities, e.g. numbering plan areas (NPA) which 
it can recognize and a listing of latitudes and longitudes for 
determining geographical location for calling numbers. In step 
305, the voice processing unit computes probabilities for the 
entire vocabulary based upon distance from the locality of the 
calling number and population and also within the calling 
number's own area code or locality. in step 306, the voice 
processing unit issues the message "For what city?" to prompt the 
caller to state the name of the city, identifying the locality, 
and tries to recognize the name from its vocabulary using speech 
recognition based upon the acoustics, as described in the afore- 
mentioned Canadian patent application number 2,069,675. The 
voice processing unit will use the a priori probabilities to 
influence or weight the recognition process. If the locality 
name cannot be recognized, decision steps 307 and 308 cause a 
message to be played, in step 309, to prompt the caller for 
clarification. The actual message will depend upon the reason 
for the lack of recognition. For example, the caller might be 
asked to simply speak more clearly. Decision step 303 permits 
a limited number of such attempts at clarification before handing 
the call off to a human operator in step 310. The number of 
attempts will be determined so as to avoid exhausting the 
caller's patience. 

If the locality name is recognized, the voice recognition 
unit determines in step 311 whether or not the locality is served 
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by the directory assistance office handling the call. If it is 
not, the voice processing unit will play a "deflection" message 
instep 312 inviting the caller to call directory assistance for 
that area* it is envisaged that, in some embodiments of the 
invention, the deflection message might also give the area code 
for that locality and even ask the caller if the call should be 
transferred. 

If the requested locality is served by the directory 
assistance office handling the call, in step 313 the voice 
processing unit will transmit a message asking the caller to 
state whether or not the called party has a business listing and 
employs speech recognition to recognize the caller's response. 
If the response cannot be recognized, decision steps 314 and 315 
and step 316 will cause a message to be played to seek 
clarification* If a predetermined number of attempts at 
clarification have failed to elicit a recognizable response, 
decision step 315 and step 317 hand the call of to a human 
operator, if a response is recognized in step 314, decision step 
318 determines whether or not a business was selected, if not, 
step 319 plays the message "for what listing?" and, once the 
caller's response has been recorded, hands off to the human 
operator. 

If decision step 318 indicates that the required number is 
a business listing, in step 320 the voice processing unit plays 
a message "For what business name?" and employs speech 
recognition to recognize the business name spoken by the caller 
in reply • once again, the recognition process involves an 
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acoustic determination based upon the phonetics of the response 
and a priori probabilities. 

If the business name cannot be recognized, in steps 321, 322 
and 323 the unit prompts the caller for clarification and, as 
before, hands off to a human operator in step 324 if a 
predetermined number of attempts at clarification fail. 

It should be noted that, when the unit hands off to a human 
operator in step 310, 317, 319 or 324, the operator's screen will 
display whatever data the automatic system has managed to 
determine from the caller and the recording of the caller's 
responses will be replayed. 

If the unit recognizes the business name spoken by the 
caller, in step 325 the unit determines whether or not the data 
base lists a main number for the business. If not, the unit 
hands off to the human operator in step 326 and language, 
locality and selected business will be displayed on the 
operator's screen. If there is a main number for the business, 
in step 327 the unit plays a message asking if the caller wants 
the main number and uses speech recognition to determine the 
answer. If the caller's response is negative, step 328 hands off 
tot he human operator. If the caller asks for the main number, 
however, in step 329 the unit instructs the playing back of the 
main number to the caller, and terminates the interaction with 
the caller. 

As mentioned earlier, the use of a priori probabilities 
enhances the speech recognition capabilities of the voice 
processing unit 14A. Statistics collected from directory 
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assistance data show a relation between the origin of a call and 
its destination. An a priori model of probability that a person 
at a phone number NPA/NXX asks for a locality l jf is an 
additional piece of information which improves the recognition 
performance. The a priori model expresses the probability 

P(lj\l t ) of someone calling from locality lj and requesting a 

locality lj. The probability P(i i |i i ) depends on the population 

of 1$ and the distance between lj and lj. The input call locality 
li is not known precisely. From the input phone number NPA/NXX, 
the Central Office (CO) may be identified using the Bellcore 
mapping. Following that step, a set of input localities related 
to that Central Office is considered. The probability of calling 
a locality lj from a phone number NPA/NXX is: 

PdjlNpanxx) = £ FUjjPUjIIj) (Eq 1) 

l ± tCO 

The probability P (lj) of each calling locality 1, associated 
with a CO is proportional to its population. Finally, the total 
recognition score for each locality is a weighted sum of the 

usual acoustic likelihood logP{Y x Y z . . *Y N \0 5 ) and the logarithm of 
PiOjilj) \Npanxx)i 
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where 0 3 is the orthography of the location lj. An a priori model 
may be arranged to distinguish between populations having French 
or English as their first language* Knowing the language 
selected by the user, the population using that language is used 

to estimate Pd^l^ . A minimum value of 10% of the population 

is used to avoid excessively penalizing a language. 

As an example, an a priori probability model developed using 
directory assistance data collected in the 514, 418 and 819 area 
codes, is shown graphically in Figure 4. In each of these area 
codes, the number of requests to and from each NXX was collected; 
faint lines appear indicating the frequency of "any city 
requesting Montreal", "Montreal requesting any city", and "any 
city requesting itself". From these data it was possible to 
estimate the parameters of a parametric model predicting the 
probability of a request for information being made for any 
target city given the calling (NPA) NXX. The parameters of the 
model proposed are the called city's population and the distance 
between the two cities. Where o is the originating locality, d 
is the destination locality, and S is the size of a locality, 
then the likelihood of a request about d given o is 

Ud\o) = S{d)*f(|3-3|) 
The normalized likelihood is 
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L{d\o) =0.60 1Mb) 

£ Ud'\o) 

over aii d' 



When the destination city is also the origin city, the 
likelihood is higher, so this is treated as a special case. 

It is assumed that 60% of DA requests are placed to 
localities including the originating locality as governed by the 
equation above, and an additional 40% of DA request are for the 
originating city, giving 



P(d\o) =L(d 
=£(d 



o) , d*o 
C) + 0, 40, d-o 



Intuitively, the function f(o,d) varies inversely with the 
10 distance between cities. In order to better define the function, 
a table of discrate values for certain distance ranges was 
derived from community of interest data collected in the three 
Quebec area codes. The distance units used in this section are 
the ones used by Bellcore to maintain geographical locality 
15 coordinates in North America. One kilometre is roughly equal to 
1.83 Bellcore units. 

The discrete function values f computed for a given distance 
range in the province of Quebec are given in the Table below for 
each area code. Since the goal was to obtain an a priori model 
20 for the entire province, the values for f (o,d) were computed for 
the province as a whole through factoring in the probability of 
a call originating in each area code. This was estimated to be 
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in proportion to the number of NXX's per area code relative to 
the province as a whole. 
This gives 

f (Province} = {0.40 f<514)} + {0.27 f<819)} + {0,33 f(418>} 



distance 


514 


819 


418 


Province 


0-25 


1.0 


1.0 


1.0 


1.00 


26-50 


0.9 


0.3 


0.7 


0.67 


51-75 


0.4 


0.0 


0.2 


0.23 


76-100 


0.1 


0.0 


0.3 


0.14 


101-125 


0.1 


0.0 


0,1 


0.07 


126-150 


0.1 


0.0 


0.1 


0.07 


151-175 


0.0 


0.0 


0.2 


0.07 


176-200 


0.0 


0.0 


0.0 


0.00 


>200 


0.0 


0.0 


0.0 


0.00 



Given the sparseness of data, the model for obtaining 
weights as a function of distance was converted from 
nonparametric to parametric. For this purpose r a least square 
fit was performed on the data from ranges 26-50 to 151-175. The 
distance value used in the fitting process was the median 
distance in each range. An analysis of various function forms 
for the regression showed that the form below provided the 
closest fit to the collected data: 
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f ' (distance) - {A/distance} + B 
The best coefficients obtained from the analysis were 

A « 33.665 
B - -0.305 

This function reaches zero when the distance is equal to 
196* In order to not eliminate the possibility of handling a DA 
request when the distance was greater than this value , the 
function was modified to accommodate distances of up to twice the 
maximum distance between any pair of cities with population 
10,000 or greater in the province ♦ The two most distant cities 
that matched this criteria were Rouyn-Noranda and Gaspe at a 
distance of 2,103 units. The maximum distance at which f would 
be zero was set to be A ,201 distance units. The function 
switches to a negative slope linear function at the point where 
the predicted value of f is 0*01. This corresponds to a distance 
value of 167. 

The final f becomes 

min (1, (33.65/d) - 0.305, d < = 167 

0.01, d > 167 

The fit of this model to the collected data, labelled 
"nonparametric model:, is shown in Figure 5. 

In order to determine the effects of the a priori model on 
recognition rate, the model was applied to simulated DA requests, 
and each token in the test set was rescored to take a priori 
likelihood into account. The function used for rescoring was 

weighted score -nas + JT log{P(ojd) } , 



19 



2091658 



where nas is the normalized acoustic score, the acoustic score 
over the number of frames in the utterance. The proportionality 
constant K was trained to maximize the recognition rate over the 
province of Quebec. The distribution of tokens in the test set 
5 is normalized to be that predicted by the a priori model. For 
this reason a correctly recognised simulated DA request from a 
city to the same city carries more weight when computing 
recognition rate than does a request for a small distant city 
with a correspondingly low a priori probability. A recognition 

10 rate was thus determined per city and then the overall provincial 
recognition rate was computed by taking the sum of the rate for 
all cities in proportion to their respective populations. The 
only assumption made in applying the model was that the calling 
NPA/NXX is known, which allows the utterance to be rescored by 

15 mapping it to all cities corresponding to the given entry in the 
Bellcore database. 

The a priori model was further refined in order to avoid 
favouring the bigger cities unduly, as the recognition rate on 
these based on acoustics alone was already above average. For 

20 this purpose, constants were introduced in the model 
corresponding to population ranges over the target cities in 
order to reduce the effective populations. These constants were 
not applied to the modelled distribution of requests since this 
would invalidate the method for computing the provincial 

25 recognition rate. The function defining likelihood becomes 

L{d\o) *j<; (d) S(d)f(|o-a|) 
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where r(d) is a range of destination locality population for 
which the constant K applies. The best ranges and their 
associated constants were then determined empirically from a 
development set* 

Thus, using a priori call distribution, and flexible voice 
recognition, embodiments of the present invention are capable of 
automating at least the front end of a directory assistance call 
and in some cases the entire call* 

The embodiment of the invention described above is by way 
of example only* Various modifications of, and alternatives to, 
its features are possible without departing from the scope of the 
present invention* For example, the voice processing system 
might be unilingual or multilingual rather than bilingual* The 
a priori probabilities need not be geographical but might be 
determined in other ways* For example, they night be determined 
according to time~of~day or season of year, or determined with 
reference to a history of calls placed by a particular caller or 
callers* 
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THE EMBODIMENTS OF THE INVENTION IN WHICH AN EXCLUSIVE RIGHT OR 
PRIVILEGE IS CLAIMED ARE DEFINED AS FOLLOWS: 

5 !♦ Apparatus for at least partially automating directory 
assistance in a telephone system, comprising a voice processing 
unit comprising a database of vocabulary items and data 
representing a predetermined relationship between each vocabulary 
item and a calling number in a location served by the automated 

10 directory assistance apparatus, means for issuing messages to a 
caller making a directory assistance call to prompt the caller 
to utter a required one of said vocabulary items, means for 
detecting a calling number originating a directory assistance 
call, means responsive to the calling number and said data for 

15 computing a probability index representing the likelihood of a 
vocabulary item being the subject of the directory assistance 
call, and speech recognition means for recognizing, on the basis 
of the acoustics of the caller's utterance and the probability 
index, a vocabulary item corresponding to that uttered by the 

20 caller. 

2. Apparatus as claimed in claim l, further comprising means 
for transmitting a message to the caller giving the required 
directory number corresponding to the vocabulary item. 

25 

3. Apparatus for at least partially automating directory 
assistance in a telephone system, including voice processing 
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means for issuing to a directory assistance caller a message 
inviting the caller to utter the name of a location, recognizing 
the place name from the utterance, determining whether or not the 
location is within the area served by the automatic directory 
5 assistance apparatus and, in the event that it is not, playing 
a message to the caller inviting the caller to direct the 
directory assistance request to an alternative locality. 

4. A method of at least partially automating directory 
10 assistance in a telephone system comprises a voice processing 
unit comprising a database of vocabulary items and data 
representing a predetermined relationship between each vocabulary 
item and a calling number in a location served by the automated 
directory assistance apparatus, comprises the steps of issuing 
15 messages to a caller making a directory assistance call to prompt 
the caller to utter a required one of said vocabulary items, 
detecting a calling number originating a directory assistance 
call, computing, in response to the calling number and said data, 
a probability index representing the likelihood of a vocabulary 
20 item being the subject of the directory assistance call, and 
employing speech recognition means to recognize, on the basis of 
the acoustics of the caller's utterance and the probability 
index, a vocabulary item corresponding to that uttered by the 
caller. 
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METHOD AMD SYSTEM FOR HOME INCARCERATION 



Technical Field 

The present invention relates to remote 
verification of the presence of a particular individual 
within a predetermined confinement area, broadly 
5 described as a home incarceration system. More 
particularly , the invention is directed to a method and 
system for remotely confirming that the individual is at 
the prescribed premises by communicating with the 
individual via a telephone network, identifying the 
10 location by utilizing caller line identification and 
identifying the individual by voice identification 
speech processing* 

Background Art 

The concept of home incarceration has evolved as an 

15 alternative to detention in government jail and prison 
facilities. In cases of relatively light infractions, 
offenders, rather than being placed as inmates in 
overcrowded facilities, are confined to predetermined 
limited geographical areas including, for example, homes 

20 and workplaces. The burden on the prison system is 
relieved by enabling more space for criminals convicted 
of more serious crimes. Cost efficiency is also a 
significant factor as the expense of incarceration in 
such a facility is quite high. The degree of severity 

25 of punishment and the prospects of rehabilitation of the 
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light: offender are more appropriate to a home 
incarceration: environment than in a prison provided for 
felons . 

In a "house arrest" situation, the detainees, of 
5 course, are more likely to interact with the community. 
Public security is a socially sensitive issue and it is 
important that the activities of captives be monitored 
and supervised. The whereabouts and identity of 
individuals should be capable of being established at 
10 any time without the necessity of assignment of a law 
enforcement officer for constant surveillance on a one 
to one basis ^ 

A prior art monitoring arrangement is shown in Fig. 
1. A bracelet 20 is worn on the wrist or ankle of the 
15 detainee* A radio transmitter 22 broadcasts a coded 
signal which is received at a base 24. The base may be 
stationary or mobile. Verification of the received 
coded signal is performed at the base as indicated in 
block 25. Inasmuch as the signal has a limited range, 
20 reception of the signal at the base is indicative that 
the bracelet, and presumably the detainee, is within the 
defined area of confinement. The signal may be 
continuously or selectively generated* 

The base is under the control of a processor 
25 through a telephone line 30* The processor may be part 
of a local area network including a file server 32 
having data base information of all detainees in the 
system. At any time the system may call, via the 
telephone, the confinement site and ask for 
30 verification. Telephone calls may be made randomly or 
at scheduled intervals determined by the system. If the 
signal is to be continuous and the base senses an 
interruption in the signal, the sysnem will initiate a 
call for verification. 
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During a call, the detainee is requested to 
position the bracelet appropriately near the 
transmitter. The transmitter then picks up the code 
from the bracelet and transmits it back to the base. 
5 If the transmitter is beyond the range of the base, or 
i£ the code is not verified, the base can initiate a 
call to the system processor to indicate tfhat the 
detainee is not responding or has not been verified. 

A similar prior art arrangement is disclosed in 
10 U.S. Patent No. 4 , 747 ,120. A bracelet capable of 
generating a coded signal is worn by the person to be 
monitored. A decoder, connected with a telephone, can 
decode the signal when the bracelet is appropriately 
positioned adjacent the decoder and the decoded signal 
15 can be transmitted over the telephone network to the 
remote system. 

The above described arrangements, intended for 
selective or continuous personnel monitoring, have 
inherent disadvantages. In the prior art embodiment: of 
20 Fig, 1, lengthy interruptions in signal transmission can 

be caused by various sources of interference. As a 
result, the base may give frequent false indications of 
nonverification, requiring human intervention. Where 
the coded signal is transmitted by the phone line, 
25 rather than by radio transmission, continuous monitoring 
is impractical, as an on line connection must be 
continuously maintained for each person monitored. 

A phone call by the system to the confinement site 
for purposes of verification will not be productive 
30 during periods in which radio transmission is 
interrupted by interference. As a backup for such 
instances, monitoring personnel may attempt to identify 
the voice of the called party during the telephone 
conversation. The listener would be required either to 
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know the confinee personally or be familiar with voice 
recordings of the individual to be verified . Such 
identification attempts likely would not be successful 
if the system serves a large number of detainees or if 
5 the speech of the called party is slurred by the 
influence of drug or alcohol abuse. Enforcement 
personnel frequently must be dispatched *to the 
confinement sites to resolve the issue* 

A further drawback of these systems is that the 
10 coded signal may be verified without complete assurance 

that the signal emanates from the location of 
confinement. Xn the case of radio transmission to the 
base, while the transmission range may be limited, the 
range may nevertheless extend beyond the bounds of 
15 confinement- In the case of telephone transmission, the 
system may be thwarted by placement of a decoder at a 
telephone, which is provided with call forward service, 
in an unauthorized area. A call placed by the system to 
the site of incarceration could be call forwarded to the 
20 unauthorized area and the code would be verified, 
falsely indicating that the detainee is identified and 
present at the appropriate location. 

A further complication in these systems involves 
the physical structure of the bracelet. Bracelets must 
25 be constructed to resist tampering* The device must he 
affixed to the particular individual so that the 
identity of that person can be assured when receiving 
the signal transmission. The device is cumbersome in 
order to prevent easy removal. In addition, each 
30 bracelet must have a self-contained power supply 
sufficient for operation over an extended time period. 

U.S. Patent No. 4,843,377 contemplates the use of a 
voiceprint as a means for remote identification of a 
prisoner. Audio spectral analysis is performed and 
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applied to speech transmitted over a telephone line to 
determine a match with a probationer's voiceprint. 
Several commercially available- systems are discussed. 

While voice analysis may be a reliable means to. 
5 determine the identity of an individual , such a system , 
in itself, cannot verify that the individual is at the 
prescribed location. Call forwarding, in the nAwork or 
on the premises f can result in the appearance of a party 
being in the prescribed location, while in fact f being 
10 elsewhere. 

Disclosure of the Invention 

Accordingly, an object of the invention is to 
provide a home incarceration system having the 
capability of remotely verifying the identity of an 
IS individual and the location of the individual at any 
time. 

Another object of the invention is to provide a 
home incarceration system that is not subject to false 
indications of nonverification which may result from 
20 outside interference. 

A further object of the invention is to enable 
remote monitoring of detainees in confinement without 
the necessity of a device that may be subject to 
physical tampering or breakdown. 
25 Yet another object of the invention is to enable 

simultaneous, remote and automatic monitoring of a large 
number of confinees while requiring a minimum number of 
monitoring personnel. A related object is to provide an 
automatic warning or message to remote personnel in the 
10 event of a system determination that a confinee is 
absent from the required location • 

A further object is to permit multiple legitimate 
sites of incarceration based, for example, on time of 
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dav ox week and which can be easily verified by known 
telephone number. 

An additional object of the invention is to provide 
a plurality of local control centers r each serving a 
5 home incarceration monitoring and control function for a 
prescribed geographical area and having the capability 
to selectively transfer the functions of any pe&ticular 
control center to another local center or a master 
network center for prescribed time periods. 
10 The above and other objects of the invention are 

satisfied in part by providing a telephone communication 
network linking each confinement location to a remote 
home incarceration center. The system includes a 
controller and storage at the control center or at a 
15 remote location linked thereto. The system maintains a 
database of inmates currently included in the program. 
The prisoner database includes each inmate's name f 
telephone number of the site of incarceration r and date 
and period of incarceration. In the event that the 
20 inmate is permitted a work schedule, the database 

includes the telephone numbers of each permitted 
location and corresponding scheduled time periods. 

When a new inmate is added to the system, the 
inmate is escorted to the incarceration location by 
25 civil authorities* Once there, telephone communication 
is established with the home incarceration center to 
establish an identity for the inmate. Voice training is 
undertaken to establish voice templates for the 
individual inmate. A variety of words are selected to 
30 form a test vocabulary. The words are recited by the 
inmate from the incarceration site and transmitted to 
the incarceration center where a voice template for each 
word is created and stored. This procedure avoids any 
detrimental influence resulting from variations in 
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telephone transmission characteristics from different 
origins . 

Once in the system database , and with the voice 
templates established, the inmate is subjected to 
5 periodic testing. Testing may be performed at 
predetermined schedules and at random intervals, A test 
is initiated by retrieving the inmate's number $rom the 
database and calling the incarceration site* An 
announcement is then made, requesting the inmate to call 
10 back in to the home incarceration center within a fixed 
time period to conduct the voice identification test. 
The system will be prepared to accept the incoming call. 
Caller line identification at the control center 
determines if the return call is made from the 
15 incarceration site. During the call, the inmate is 
required to recite a statement, prepared at the 
incarceration control center, including- randomly chosen 
words from the template vocabulary. Comparison is made, 
using speech analysis, between the recited statement and 
20 the stored templates. As the statement is unknown to 
the inmate in advance, an attempt to use a voice 
recording as a response, with the inmate absent, would 
be futile. 

The testing can be controlled manually at the 
55 incarceration center or be handled completely 

automatically. In an automated test procedure, the 
system would send notification, visibly or audibly, to 
an administrator of any test failures. Such 
notification may be transmitted through the network to 
0 the administrator at a location remote from the test 
center. A log file is maintained by the system for the 
purpose of recording all activity by the system, whether 
manually or automatically instituted. 
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The system can operate in the environment on one or 
more sites of the law enforcement authority premises on 
a dedicated line basis* Alternatively, a single system 
can be shared on a network basis by several law 
5 enforcement agencies by appropriate partitioning. An 
additional aspect of the invention is call forwarding 
control by one center to another for "aftef hours" 
monitoring or for other purposes. 

Additional objects and advantages of the present 
10 invention will become readily apparent to those skilled 
in this art from the following detailed description , 
wherein only the preferred embodiment of the invention 
is shown and described, simply by way of illustration of 
the best mode contemplated for carrying out the 
15 invention* As will be realized, the invention is 
capable of other and different embodiments, and its 
several details are capable of modifications in various 
obvious respects, ail without departing from the 
invention* Accordingly, the drawings and description 
20 are to be regarded as illustrative in nature, and not as 
restrictive. 

Brief Description of Drawings 

Fig. 1 is a block diagram illustrating a prior art 
monitoring system. 
25 Fig. 2 is a block diagram of a system according to 

the present invention, depicting a network including a 
control office and controller within the network. 

Fig. 3 is a block diagram according to the present 
invention, representing broadly the components of the 
30 system including the control functions* 

Fig. 4 is a function map of the home incarceration 
architecture of the present invention. 
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Fig. 5 is a function map of the speaker 
verification process of the present invention. 

Fig. 6 is a network layout of the present 
invention, illustrating transfer capability among 
5 various geographically separated control centers. 

Best Mode for Carrying out the Invention 

Fig. 2 broadly illustrates a home incarceration 
system including incarceration control center 48. An 
adjunct processor/server is shown at 50 interconnected 
10 through a network 52 to basic incarceration controllers 
54, only one of which is represented in the figure for 
simplicity of illustration. Network 52 may be a local 
area network, such as ethernet, or a wide area network, 
such as a private line Tl network. The basic 

15 incarceration controller is shown connected, via 
telephone facilities (e.g., lines ) r to a central office 
56, which is also connected via telephone lines to a 
particular local incarceration site 58. In addition to 
serving the area of central office 56, control center 48 

20 may be extended to include additional central office 
areas such as the central office 60, shown connected 
therewith through interoffice facility line 62. 

The basic incarceration controller includes 
telephone interfaces, a processor and voice 

25 processing/response capabilities, using appropriate 
hardware and software. The basic incarceration 
controller may also include storage for the inmate 
database and speech templates to perform all control 
functions as an independent unit. The system capacity 

30 can be extended beyond the limits of the basic 
incarceration controller by the adjunct 
processor/server, which includes memory storage and 
program control for the basic incarceration controller. 
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Fig. 3 is a further development of the control 
elements of the system shown in connection with central 
office 56. The central office is connected to 
subscriber loop connector 70 and modem 74 through bridge 
5 circuit 72 . Bridge circuit 72 allows an incoming call 
to be split, permitting the incoming call signals to be 
applied to the subscriber loop connector 70 aftd modem 
74. The subscriber loop connector is connected to a 
communications port, not shown, in controller 78 through 
10 amplifier/ filter circuit 80. The modem 74 is connected 
to an originating call identification device 76. 

The subscriber loop connector is a well known unit 
that performs both incoming and outgoing call functions. 
This unit serves to control the telephone connection 
15 with the central office and user telephone line. For 
incoming calls, for example, the unit detects ringing, 
on-hook and off -hook. The unit, under direction of the 
controller, performs outgoing call functions. These 
functions, in an alternative embodiment, can be 
20 incorporated into an appropriate function board in the 
system microprocessor. 

In operation, for incoming calls block 76 
identifies the originating telephone number from 
information transmitted between the first and second 
25 ringing signals. A detailed description of the 
preferred composition of this device is contained in 
U.S. application serial number 07/515,027, filed April 
26, 1990 which application is herein incorporated by 
reference. Alternatively,- the functions of block 76 can 
30 be incorporated in the control program. 

The subscriber loop connector establishes of f -hook 
connection between the central office and the controller 
after the second ring. At this time the calling line 
has been identified by block 76 and this information is 
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transmitted to the controller 78. The controller 
includes a processor for comparing the caJLler 
identification information with the stored database 
information associated with the inmates. 
5 The processor also performs speech analysis, 

comparing the transmitted voice of the caller with 
stored voice templates. Speech and voice processing may 
be performed in accordance with technology well known in 
the art. Examples of suitable speech verification 

10 algorithms may be found in "Digital Processing of Speech 
Signals r " by Rabiner and Schafer, Bell Laboratories, 
Prentice-Hall, Inc. 1978, particularly at page 457. 
Further description of voiceprint analysis for voice 
identification may be found in U.S. patent number 

15 3,525,811. 

The filters and amplifiers forming block 80 
condition the transmitted audio signals to limit the. 
band width and strengthen the signals appropriately to 
the requirements of the processing to be performed in 

20 controller 78. Two way voice communication is 
transmitted between the incarceration site and the 
controller through the path including the central 
office, bridge circuit, subscriber loop connector, and 
-amplifier/ filter circuit. 

25 Upon entering a new inmate into the system, a 

telephone call is established whereby M voice training" 
is performed. Voice templates of a selected word 
vocabulary are created and stored in the data base. The 
data base also includes the inmate's telephone number at 

30 home, work or other permitted location, scheduled hours 
at each permitted location, telephone number of 
probation officer or other official to notified in case 
of a violation, and any other pertinent information. 
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The data base can be updated at any time without 
interrupting the calling activity of the system. 

Testing is performed by calling the telephone at 
which the inmate is scheduled to be and requiring the 
5 inmate to call back to the control center. In the 
return call, the caller line is identified by block 76 
and the inmate is required to repeat a slaatement 
including selected words from the template vocabulary. 
Verification of the caller's voice is made by comparison 
10 therewith with the stored templates, using voice 
analysis techniques described above* Dynamic adaptive 
updating of the templates may be periodically performed 
upon successful voice verification* 

Calls may be placed by the control center on the 
15 basis of a predetermined schedule as well as randomly. 

The system has flexibility to determine frequency of 
random calls made per day and to change the frequency 
for each inmate as deemed appropriate. For example, 
inmates who have violated curfew might be assigned a 
20 higher frequency of random calls* In addition, inmates 
may be required to call in regularly at predetermined 
times . 

Violations are reported automatically to 
administrative personnel by transmission of a message 

25 to a remote printer or terminal. Notification may also 
be effected by audible alarm f paging or delivery of a 
voice mail message. All activity is recorded in a log 
file maintained in the system. 

The home incarceration monitoring scheme may 

30 include a continuous signalling device worn by the 
inmate as part of a hybrid system. This added 
redundancy would make the home incarceration concept 
more socially acceptable as well as afford continuous 
surveillance. During times in which broadcast 
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transmission of the continuous signal is interrupted by- 
interference or other instances in which no signal is 
received , the system can initiate a call to the inmate 
for verification by calling line identification and 
5 voice analysis. 

Figs. 4 and 5 are charts illustrating the functions 
of the system including initialization, database 
administration, training and testing. These functions 
are under the control of a main program executed by the 

10 system processor. Database administration includes 
adding and deleting information, as well as a print 
capability* In Fig. 5, speech processing includes voice 
training to create templates and testing, using the 
templates and transmitted speech. 

15 Calling party number identification may be obtained 

through ISDN or analog lines equipped with caller line 
identification or similar services. Number 
identification can also be transmitted using out of band 
signaling, packet switching or Simplified Message 

20 Service Interface (SMSI), ISDN primary rate access and 
bulk calling line identification. In some cases a trunk 
arrangement may be used in a PBX environment. 

Fig. 6 illustrates an intelligent network 
application of the home incarceration service. Central 

25 offices 56, each serving end user incarceration sites 
58, are shown interconnected with each other. A local 
control center 54 may be customer premises equipment or 
network based and can perform voice verification and 
caller party number identification. Similarly, a larger 

30 area control center, which may be customer premises 
equipment or network based, is shown at 55. 

Sufficient hardware and software to serve the 
entire system is provided at network base processor 50, 
which may be used in conjunction with signal control 
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point 80. The signal control point is attached to the 
network through signal transfer point 82 to monitor all 
signaling within the network and to intelligently 
control the action to taken based on the signal. 
5 Additional signal transfer points may be included to 
accommodate network size* 

The signal transfer point is connected to* each of 
the central offices through SS7 or other data links for 
database information transfer* The local control center 
10 may be operational for limited hours ♦ Transfer of the 
functions of this center for after hours coverage can be 
made to the area control center under control of- the 
network base processor via the signal transfer point or 
by call forwarding from the local office* 
15 Only the preferred embodiment of the invention and 

but a few examples of its versatility are shown and 
described in the present disclosure. It is to be 
understood that the invention is capable of use in 
various other combinations and environments and is 
20 capable of changes or modifications within the scope of 
the inventive concept as expressed herein. 
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WHAT IS CLAIMED IS: 

1, A method for remotely verifying, at a 
verification site, the attendance of a particular person 
at a predetermined area, said area being provided with a 
telephone, comprising the steps of: 
5 establishing an on line telephone connection 

between said verification site and said telephone; 

determining a voice characteristic of said person 
at said verification site in response to speech 
transmission through said telephone connection; and 
10 testing for the presence of said person at said 

area, said step of testing comprising the steps of: 
identifying a calling telephone line 
directory number in response to an incoming 
telephone call; 
15 establishing an on line connection for 

said incoming telephone call; and 

analyzing a voice transmitted during said 
incoming call. 

2. A method as recited in claim 1, wherein said 
step of testing further comprises a step of determining 
vhether the identified telephone line directory number 
corresponds to a predetermined line directory number 

5 associated with said telephone* 

3. A method as recited in claim 2, wherein said 
step of testing is selectively performed at random times 
and further comprises: 

calling said telephone from said verification site; 
5 establishing a further on line connection between 

said verification site and said telephone; 
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requiring a return call from said telephone to said 
verification site within a set time period after 
termination of said further on line connection; and 
10 determining whether a return call has been 

established within said set time period* 

4. A method as recited in claim 3 f further 
comprising the step of generating a warning indication 
upon determination that a return call has not been 
established within said set time period. 

5. A method as recited in claim 2 r wherein said 
step of determining a voice characteristic comprises: 

selecting a plurality of words to form a voice 
vocabulary; 

5 creating a voice template for each word of said 

vocabulary as spoken by said person in said speech 
transmission; and 

storing the templates created. 

6. A method as recited in claim 5, wherein said 
step of testing further comprises: 

preparing a statement containing one or more words 
included in said vocabulary; and 
5 requiring the caller of said incoming call to 

recite said statement; 

and said step of analyzing comprises comparing the 
recited statement with the stored templates. 

7« A method as recited in claim 6 r wherein said 
step of determining a voice characteristic further 
comprises dynamically updating the stored templates. 
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8 • A method as recited in claim S , f norther 
including the step of generating a warning indication 
upon a condition that either said identified telephone 
line does not correspond to a line associated with said 
5 telephone or that the voice transmitted during said 
incoming call does not match said stored templates. 

9. A method as recited in claim 8, wherein said 
generating step comprises displaying a message on a 
terminal. 

10. A method as recited in claim 9, wherein said 
terminal is remote from said verification site. 

11. A method as recited in claim 8/ wherein said 
generating step comprises printing. out a message. 

12. A method as recited in claim 8, wherein said 
generating step comprises transmitting a warning 
message * 

13. A method as recited in claim 12, wherein said 
message is a paging communication. 

14. A method as recited in claim 12, wherein said 
message is a voice mail message. 

15. A method as recited in claim 12, wherein said 
warning message comprises an audible alarm. 

16. A method as recited in claim 12, wherein said 
step of transmitting a warning message comprises 
automatically generating a radio dispatch to a .patrol 
vehicle . 
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17* A method as recited in claim 6, further 
comprising storing results of the tests performed. 

18. A system for monitoring at one or more remote 
locations the presence or absence of a particular person 
within a defined area comprising: 

a telephone at said defined area; 
5 verification means remote from said area for 

verifying the identity of an individual at said area; 
and 

a communications network for establishing 
communication between said telephone and said 
10 verification means; 

said verification means comprising: 

voice processing means for analyzing an 

incoming voice transmission from said 

commanications network; and 
15 caller line directory number 

identification means for identifying an 

incoming caller telephone line* 

IS. A system as recited in claim 18 , wherein said 
voice processing means comprises: 

means for creating voice templates of a preselected 
word vocabulary for said person; 
5 means for storing said voice templates; and 

means for comparing spoken words of said voice 
transmission with said voice templates. 

20. A system as recited in claim 12, wherein said 
verification means further comprises storage means for 
storing information including identification of said 
telephone line directory number as a reference for 
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5 comparison with incoming caller line directory number 
identification whereby origination of an incoming call 
from said area may be verified* 

21* A system as recited in claim 18, wherein said 
verification means further comprises means for 
generating a warning indication if said incoming caller 
line directory number identification does not correspond 
5 to said stored telephone line directory number 
identification or if said spoken words of said voice 
transmission does not match the stored voice templates* 

22. A system as recited in claim 21 , wherein said 
means for generating includes a display terminal. 

23. A system as recited in claim 21 , wherein said 
means for generating includes a printer, 

24. A system as recited in claim 20, including two 
or more telephones at geographically separated locations 
within said defined area, said storing means including 
stored identification of each of said telephones. 

25. A system as recited in claim 18, including a 
plurality of said verification means connected to said 
communications network at separated locations, each of 
said verification means capable of monitoring a 
plurality of identified persons at various locations 
within a distinctly defined area, 

25. A system as recited in claim 25, wherein said 
communications network comprises a plurality of signal 
transfer points and call forward means for transferring 
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verification operation from one of said verification 
means to another of said verification means through a 
signal transfer point in said network. 

27 « A system as recited in claim 18 , further 
including means affixed to said person for transmitting 
a continuous signal and means for monitoring said 
continuous signal. 



WO 93/05605 



PCT/US92/07645 




FIG. 1 

PRIOR ART 




--32 



50 



ADJUNCT 
PROCESSOR/ 
SERVER 



52- 



48 



• ^INCARCERATION 
r CONTROL CENTER 



54-f BASIC 

INCARCERATION 
CONTROLLER 



I 

56- 



I 



* 

WO 93/05605 



PCT/US92/07645 




WO 93/05605 



PCT/US92/07645 




WO 93/05605 



PCT/US92/07645 



MAIN 
PROGRAM 



INITIALIZATION 



FUNCTION 
PROMPTING 



TRAINING 



TESTING 




SPEECH 
PROCESSING 



FIG. 5 



WO 93/05605 



PCT/US92/07645 




FIG. 6 



INTERNATIONAL SEARCH REPORT 



PCI7US92/07645 

A. CLASSIFICATION OF SUBJECT MATTER 
ffC<5) :H04M 11/04 
USCL :379/38 

According to International Patent Classification (IPC) or to both national classification and IPC 

B» FIELDS SEARCHED 

Minimum documentation searched (classification system followed by classifi c a t ion symbols) 
U.S. : 379/42,49£04<*,142;340/5re^ 

Documentation searched other than tmnhnum documentation to the extent that such documents are included in the fields searched 



Electronic data base consulted during the international search (name of data base and, where practicable, search terms used) 
none 



DOCUMENTS CONSIDERED TO BE RELEVANT 



Category* 


Citation of document, with indication, where appropriate, of the relevant passages 


Relevant to claim No. 


A 


US,A, 5,023,901 (SLOAN ET AL) 


1-27 




11 JUNE 1991 






See entire document. 




A,P 


US,A, 5,054,055 (HANLEET AL) 


1-27 




01 OCTOBER 1991 






See entire document. 





I Further documents arc listed in the continuation of Box C. | | See patent family annex. 



"A - 
"E" 



Special categories of cited document*: 

document defining the general ittteof tteart wbkli unoiconsktend 
to be part of particular rcievance 

earlier document published oa or after the international filing date 

document which may throw doubt* on priority daim(r) or which » 
chrd to catabliah the publication dale of another citation or other 
ooa(aa specified) 



later docuraeotpabliaaed after the inteniaijoittlfflmg or priori^ 
date and not m conflict with the app ucaiionbm cited to imderitaad the 
principle or theory underK/ing the mvenhoa 

docuraent of particular relevance; the churned invenlwo cannot be 

i inventive atep 



document referring to an oral disclosure, use* exhibition or other 

dVicunietitpt&u^ed prforlo Ite 
the priority dateciaimcd 



document of particular relevance; the churned invention, 
considered to involve an inventive atep when the 



bemgobviouatoa person akilkd in the art 
ne patent Family 



Date of the actual completion of the intemationai search 
18 NOVEMBER 1992 


Date of mail^^^h^tn^Q^bnal search report 


Name and mailing address of the ISA/ 
Connmatiooer of Patenti and Tradctnarks 
Box PCT 

Washington, D.C. 20231 
Facsimile No. NOT APPLICABLE 


Authorized officer ^ JbZ^J!^ 

WING F. CHAN 
Telephone No. (703) 305-4732 



Form PCT/1SA/2I0 (second sheetXJuly 1992)* 



N-ARY JOIN FOR PROCESSING QUERY BY EXAMPLE 
K. £. Niebuhr and S. E. Smith 



/7. 



Fic. i 



FIHtSHEO • "TES* 



ST4»T 



COT-TOl 
KTlliVCIWE-'STAtr 
FIIISHED-'M* 



FIRISNED 



STOP 



BETtlEVE »E1I OtfA 

tuple m* COT 



COT-TOl' 


no 




m\ I. 


no 



SETUIEVE successful!. 7 



T£5 



CO** 80L 7 



1 



SET CIT TO HIT 
HICHCR 0« 5EAHCH 
LIST 

I 



TES 



HO 



MAP P*UT ELEMENTS 
TO OUTPUT RELATORS 



SET COT* TO i *EXT L0IF.R 
OX SEARCH LIST 



retrieve mt* 



I 



SET «T TO L01E3T 
PtlMT QtlEIT TUPtE 
ON SEARCH LIST 



RETRIEVE HODE^UBT- 



RCTBIEVE HO0E- "RESU»f" 



Given a relational data base, the present method provides -in efficient 
way of perforating the join operations needed to solvr. a qucry-by-example 
query. In an earlier publication [1], the joining wan accomplished by a 
sequence of binary joins, «ach of which produced a temporary relation 
which could be involved in subsequent joins. That approach had the 
following potential disadvantages*. 

CI) Overhead associated with storing the data cup lea In the temporary 

relations resulting from the joins. 
(2) Because all possible solutions were carried throughout the sequence 

of joins, the cardinalities of the temporary relations could become 

very large for a modest size data base. 
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(3) Inversions were not available in the temporary results so thev h«,l 
to be created, if needed. Moreover, existing inversions were" fre- 
quently not used. 

The proposed method overcomes all of these shortcoming* with a single 
operation which is called the n-ary join. 

Terminology 

A. Query Tuples and Data Tuples 

It is necessary to distinguish between two kinds of tuples. The 

^!L COn f C T ucts a 1*«sy-by-«*«wpl« qu«r7 in the form of 'Wry tuple*" 

Its cue^i"? 11 C f ^ [1 - 21 ' * «ch skeleton" cable and 

its query tuples is a relation containing "data tuples". The quorv 

oroer to be included in the answer to a query. 

B. Selection and Linkage Constraints 

The data tuples which comprise the answer to a query must satisfy 
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two kinds of constraints ♦ 'Selection' 1 constraints and "linkage" con- 
straints. Selection constraints are intra-tuple conditions specified by 
a query tuple. Examples of selection constraints arc: 

*Column 3 must have a value equal to 10. 

*The value in column 5 must be greater than the value In column 2, 

in general t a selection constraint is a function over the entries in a 
single data tuple which evaluates to true or false. 

Linkage constraints are inter- tuple conditions specified hrtween 
multiple query tuples* Examples of linkage constraint:; arc: 

*The value in column 3 of the data tuple in R2 must be greater than 
the value of column 5 of the data tuple currently spier. tfed from 
Rl. 

*Thc set of column 4 values for all data tuples in a relation, R2. 
which have the same value in column 3 of the data tuple in 112 
under investigation (column 3 is a "grouper" coJumn) must be 
contained in a similarly defined set for the currently selected 
data tuple of a relation Rl. ("set Join", see [I]), 

Search List 

Prior to initiating the n-ary join operation, the query tuple* arc 
organised in a list called the Search List* This list determines the 
order in which the query tuples and their underlying data relations are 
processed with the query tuple at the top being processed first, Explicit 
join query tuples are not placed on the Search List, If explicit join 
tuples exist in a query, their "print" elements arc temporarily trans- 
ferred to the appropriate normal query tuples* 

N-ary Join 

The n-ary join will be described using the flow charts of Fig::. 1 
and Z. 

Fig, 1 shows the top level action involved in selecting the current 
query tuple for vhich a solution-satisfying d3ta tuple i« to be retrieved 
from its underlying relation. In this figure: 

*CQT is the current query tuple which is that tuple on the Search 
List which is being processed during a given pass through the 
flow chart* 

*T0L is the query tuple at the top of the Search List* 

*B0L is the query tuple at the bottom of the Search Li*t. 

*The Retrieve Mode setting dictates whether the next data tuple 
retrieval search should be started at the beginning of the relarion 
('Start') or continued from the last retrieval made from subject 
relation ('Resume*)'. 

^Retrieve Next Data Tuple From CQT returns with a Recrinvn Snr- 
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cr.ssful^'Yes 1 when chc first data tuple, which satisfies the 
selection and linkage conditions, is found or Retrieve Success- 
ful- 'No' if the end-of-relation was reached. 
*Whonever a satisfactory data tuple is found for the BOL query 
tupJe, chc print elements in che current data tuples arr sapped 
Into output relations and the retrieval resumes scanning with that 
"print 1 ' query tuple which is lowest on the Search List. 

t^ig. 2 shows the action associated with the data tuple retrieval on 
the rr lotion which underlies the current query tuple. In thin figure*: 

*CDT is the data tuple pointed to currently* 

*If the Retrieve Hode-' Start' then CDT is initially .-set to the top 
of the relation* (If inversions had existed in a dirccf acres?? 
system, an equality restriction, for example, could be associated 
with a list of valid data tuple IDs and CDT would be set* to the 
top of that list-) 

*If Retrieve Mode- 'Resume' then CDT is incremented from its current 
data tuple to the next data tuple in the relation or the next datn 
tuple available in the data tuple ID list (if such a list has 
been created). Moreover, the CDT for all query tuples must be 
stored on a continuing basis. 

*Ii the attempt to acquire the next data tuple results in an end™ 
of-mlation (EOR- r Yes T }> this procedure returns Retrieve ?Juc:cesn- 
ful" , No\ 

Candidate data tuples are first tested to determine if they par,.*; 
Selection constraints* They are then tested for inter- tuple 
dependencies with the previously selected data tuples (one per 
query tuple) from the relations that underlie those query tuples 
which are above the current query tuple on the Search List. 
Certain types of linkage tests can be eliminated- For example, if 
a query tuple has the same linkage test involving two or moru 
query tuples (which arc. higher on the Search List) , only one of 
these linkage tests is necessary.) 

*U the candidate data tuple fails either the selection or linkage 
te^ts, CDT is incremented to the next available data tuplr. 

Orde r 1 ng of Search List 

The order of the query tuples in the Search List has a significant 
s fleet on the amount of processing required to find a solution to a 
query. Three criteria are used to determine the order of the Search 
Li.sr ♦ 

(1) Cardinality. Query tuples associated with the smallest cardinality 
relations are placed at the top of the list and those associated 
with large relations are placed at the bottom, 

(2) Print elements, An attempt is made to order the list so tlwt the 
loves c query tuple with a print element is as high aa possible* in 
the list. This i3 because when a data tuple for the lower t print 
query tuple is processed, the processing of lower query tuples can 
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be terminated after the first set of data tuples satisfying all 
selection and linkage constraints has been found. 
(3) Linkage * An attempt is made to order the list such that the tuples 
at the top of the list have the maximum number of inter- tuple linkage 
constraints. This will tend to reveal lnadaissable data tuples 
as soon as possible, and avoid processing a£ lover level query 
tuples and their associated relations * 

For any given query the order ings determined by each of these 
criteria will generally be different* It is thus necessary to arrive at 
a compromise ordering which weights che importance of these three 
criteria* We use the following algorithm to arrive at an ordering 
according to these three criteria* 

(1) Each query tuple is assigned a number. If It does not hove print 
elements, this number is its cardinality* If it hats print elements, 
the number is its cardinality divided by 4, (Four is a "print 
factor".) 

(2) The query tuple with the smallest number Is assigned to the top of 
the Search List, 

(3) The next position on the Search List is assigned to the query tuple 
which has the maximum number of linkage constraints with tuples 
already on the list. In case of a tie, the tuple with the smallest 
cardinality number (from step 1) is selected. 

(4) Repeat step 3 until all query tuples are assigned to the Search 
List. 

Optional Selection Preprocessing 

For query tuples other than the one at the top of the Search List, 
it will sometimes be beneficial to perform a preliminary scan of the 
underlying relation (using inversions when possible) to produce a smaller 
(In degree and cardinality) temporary relation containing only the data 
tuples which satisfy the intra-tuple constraints. The scans which would 
occur many times over each such relation will then be over the smaller 
temporary relation rather than the larger base relation. Thi:; will be 
helpful when the existing inversions on the base relations can not be 
used to process Inter-tuple constraints* 

[1] K- E. Niebuhr, K. V, Scholz and S. E* Smith, "Algorithm for Pro- 
cessing Query By Example", IBM Technical Disclosure Bulletin, July 
1976, Vol. 19, No- 2, pp. 736-741. 

[2] tt. H. 'Zloof , "Query By Example", IBM Research Report RC 4917, 
July 1974. 
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